Usage¶
Quickstart (for the impatient)¶
To type polysaccharide loci from genome assemblies:
This will run kaptive assembly with the default parameters, and
produce a table detailing the best match locus, predicted phenotype,
confidence score and detailed typing information for each input genome
assembly in the file called kaptive_results.tsv.
Detailed usage¶
We designed Kaptive 3 to be easier to use on the command-line than
previous versions by structuring the program as a series of sub-commands
that follow the general pattern of kaptive <mode> <database> <input>.
There are three modes:
- assembly: type assemblies
- convert: convert Kaptive results to different formats
- extract: extract features from Kaptive databases in different formats
Note
To see the full list of commands and options, run kaptive -h/--help.
kaptive assembly¶
Given a Kaptive database and a bacterial genome assembly,
kaptive assembly will perform 3 main tasks:
- Determines the most likely locus type of the genome assembly.
- Reconstructs the biosynthetic gene cluster from the assembly contig sequences.
- Predicts the corresponding serotype/phenotype of the genome assembly.
Note
As of version 3, Kaptive no longer supports allelic (wzi, wzc) typing.
To perform K locus typing on a directory of Klebsiella pneumoniae assemblies, you would run:
Here we have told Kaptive to perform typing of assemblies with
assembly and used the database keyword kpsc_k to specify the
Klebsiella pneumoniae K locus database. All other parameters are set
to the default.
Database keywords are a handy short-cut for using the databases distributed with Kaptive and located in the Kaptive package data. Alternatively, you can specify the full path to your own database.
You may also want to specify the locations and/or filenames of the output files using the following options:
Note, text outputs accept '-' for stdout
-o , --out Output file to write/append tabular results to (default: stdout)
-f [], --fasta [] Turn on fasta output
Accepts a single file or a directory (default: cwd)
-j [], --json [] Turn on JSON lines output
Optionally choose file (can be existing) (default: kaptive_results.json)
-s [], --scores [] Dump locus score matrix to tsv (typing will not be performed!)
Optionally choose file (can be existing) (default: stdout)
--no-header Suppress header line
Example:
This will output a tabular file called kaptive_results.tsv, a fasta
file for each assembly called {assembly}_kaptive_results.fna and a JSON
lines file called kaptive_results.json.
Warning
It is possible to write all text formats (TSV, JSON and FASTA) to the same file (including stdout), however this is not recommended for downstream analysis.
Advanced options¶
Advanced users may wish to customise Kaptive's scoring options (for picking the best match locus), confidence options (for marking matches as 'Typeable' or 'Untypeable') or database parsing options. We recommend keeping the default options for standard typing using the Klebsiella and/or A. baumanii databases distributed with Kaptive.
--min-cov Minimum gene %coverage (blen/q_len*100) to be used for scoring (default: 50.0)
--score-metric Metric for scoring each locus (default: 0)
0: AS (alignment score of genes found)
1: mlen (matching bases of genes found)
2: blen (aligned bases of genes found)
3: q_len (query length of genes found)
--weight-metric Weighting for the 1st stage of the scoring algorithm (default: 3)
0: No weighting
1: Number of genes found
2: Number of genes expected
3: Proportion of genes found
4: blen (aligned bases of genes found)
5: q_len (query length of genes found)
--n-best Number of best loci from the 1st round of scoring to be
fully aligned to the assembly (default: 2)
--gene-threshold Species-level locus gene identity threshold (default: database specific)
--max-other-genes Typeable if <= other genes (default: 1)
--percent-expected Typeable if >= % expected genes (default: 50)
--below-threshold Typeable if any genes are below threshold (default: False)
See database options here and other options:
-V, --verbose Print debug messages to stderr
-v , --version Show version number and exit
-h , --help Show this help message and exit
-t , --threads Number of threads for alignment (default: maximum available CPUs / 32)
kaptive convert¶
The convert command allows you to convert the Kaptive results JSON
file into a range of useful formats, including:
- tsv: Tabular output (tsv)
- json: JSON lines format (same as input but optionally filtered)
- fna: Locus nucleotide sequences in fasta format.
- ffn: Gene nucleotide sequences in fasta format.
- faa: Protein sequences in fasta format.
Warning
The convert command is only compatible with JSON files from Kaptive
v3.0.0 onwards.
Usage¶
...
General usage is as follows:
kaptive convert <db> <json> [formats] [options]
Inputs:
db path/keyword Kaptive database path or keyword
json Kaptive JSON lines file or - for stdin
Formats:
Note, text outputs accept '-' for stdout
-t [], --tsv [] Convert to tabular format in file (default: stdout)
-j [], --json [] Convert to JSON lines format in file (default: stdout)
--fna [] Convert to locus nucleotide sequences in fasta format
Accepts a single file or a directory (default: cwd)
--ffn [] Convert to locus gene nucleotide sequences in fasta format
Accepts a single file or a directory (default: cwd)
--faa [] Convert to locus gene protein sequences in fasta format
Accepts a single file or a directory (default: cwd)
--no-header Suppress header line
Filter options:
-r , --regex Python regular-expression to select JSON lines (default: All)
-l [ ...], --loci [ ...]
Space-separated list to filter locus names (default: All)
-s [ ...], --samples [ ...]
Space-separated list to filter sample names (default: All)
Note
Filters take precedence in descending order
For example, to convert the JSON file to a tabular format, run either of the following commands:
kaptive convert kpsc_k kaptive_results.json --tsv kaptive_results.tsv
cat *.json | kaptive convert kpsc_k - --tsv - > kaptive_results.tsv
To output multiple formats, you can run:
kaptive convert kpsc_k kaptive_results.json --tsv kaptive_results.tsv --fna - --faa proteins/
Where the tabular results will be written to kaptive_results.tsv, the
locus nucleotide sequences will be written to stdout, and the protein
sequences will be written to the directory proteins/ with the
filenames {assembly}_kaptive_results.faa.
Warning
It is possible to write all text formats (TSV, JSON, FNA, FAA and FFN) to the same file (including stdout), however this is not recommended for downstream analysis.
kaptive extract¶
Kaptive 3.0.0 and above includes a new command-line mode extract that
allows you to extract features from a Kaptive database in the following
formats:
- fna: Locus nucleotide sequences in fasta format.
- ffn: Gene nucleotide sequences in fasta format.
- faa: Protein sequences in fasta format.
Usage¶
General usage is as follows:
kaptive extract <db> [formats] [options]
Formats:
Note, text outputs accept '-' for stdout
--fna [] Convert to locus nucleotide sequences in fasta format
Accepts a single file or a directory (default: cwd)
--ffn [] Convert to locus gene nucleotide sequences in fasta format
Accepts a single file or a directory (default: cwd)
--faa [] Convert to locus gene protein sequences in fasta format
Accepts a single file or a directory (default: cwd)
Database options:
--locus-regex Python regular-expression to match locus names in db source note
--type-regex Python regular-expression to match locus types in db source note
--filter Python regular-expression to select loci to include in the database
Note
These options are useful for customising the database to your needs, for example, to include only a subset of loci or to change the way locus names and types are parsed from the source note.
Other options:
-V, --verbose Print debug messages to stderr
-v , --version Show version number and exit
-h , --help Show this help message and exit
For example, to extract the gene nucleotide sequences from the Klebsiella pneumoniae K locus primary reference database in fasta format, run:
kaptive extract kp_k --fna k_loci.fna
To extract all protein sequences from KL1 and KL2, run either one of the following:
kaptive extract kp_k --filter "^KL(1|2)$" --faa KL1_KL2_proteins.faa
kaptive extract kp_k --filter "^KL(1|2)$" --faa - > KL1_KL2_proteins.faa
To do the same but output each locus to a separate file, run either:
kaptive extract kp_k --filter "^KL(1|2)$" --faa
kaptive extract kp_k --filter "^KL(1|2)$" --faa protein_files/
Which would create two files: KL1.faa and KL2.faa.
kaptive assembly kpsc_k assembly.fasta -j kaptive_results.json
Warning
It is possible to write all text formats (fna, faa and ffn)
to the same file (including stdout), however this is not recommended
for downstream analysis.
API¶
Whilst Kaptive isn't designed to be a fully-fledged API, it is possible
to use it as a module in your own Python scripts. For typing assemblies,
you can use the kaptive.assembly.typing_pipeline function, which takes
an assembly and a kaptive.database.Database object as input and
returns a kaptive.typing.TypingResult object.
from kaptive.assembly import typing_pipeline
from kaptive.database import load_database
from pathlib import Path
db = load_database('kpsc_k') ## Load the Klebsiella K locus database once and pass it to the typing pipeline
for result in map(lambda a: typing_pipeline(a, db), Path('assemblies').glob('*.fna.gz')):
if result: ## If the assembly was successfully typed
print(result.format('tsv'), end='') ## TSV format will end in a newline, so we set end to ''
For example, if you wanted to perform K and O locus typing on a single assembly, you could do the following:
## Here, we pass the keyword arguments for the database, they will be loaded inside the typing pipeline
for result in map(lambda d: typing_pipeline('test/kpsc/2018-01-389.fasta', d), ['kpsc_k', 'kpsc_o']):
if result: ## If the assembly was successfully typed
print(result.format('tsv'), end='') ## TSV format will end in a newline, so we set end to ''
Note
By default the typing_pipeline runs minimap2 on a all available
CPUs, however this can be controlled with the threads parameter.