Interpreting the results¶
The four most important columns in the Kaptive tabular output:
Best match locusindicating the locus that is best supported by the available sequence dataBest match typeindicating the predicted phenotype based on that associated with theBest match locusand taking into account any special phenotype logic e.g. where a known essential gene is truncated, Kaptive will report theBest match typeas 'Capsule null'.Match confidenceindicates a qualitative level of confidence that the reportedBest match locusis correct (see below).Problemsindicates the features of the assembly locus that may impactMatch confidence
Note
The Match confidence relates to the locus match, and is not a direct
indication of confidence in the Best match type.
Confidence score¶
Kaptive will indicate the best matching locus and its confidence in the locus match. The critera mentioned below can be tweaked using the confidence options.
Typeable loci¶
The locus was found in a single piece in the query assembly with no genes below the minimum translated identity according to the locus thresholds and:
- no missing genes
- no (non-truncated) unexpected genes (genes from other loci) inside the locus region of the assembly
OR
The locus was found in more than one piece in the query assembly with no genes below the minimum translated identity according to the locus thresholds and:
- no less than N% missing genes (default: 50%)
- no more than N unexpected gene (genes from other loci) inside the locus region of the assembly (default: 1)
These criteria were designed in consideration of the locus definition rules (i.e. that each locus represents a unique set of genes defined at a given minimum translated identity threshold) and following systematic analysis of Kaptive outputs for draft genome assemblies compared against manually confirmed loci determined from matched completed genomes.
We allow some flexibility with regards to missing genes or additional genes found within the locus when this region of the query assembly is fragmented, because it can be difficult to distinguish genuine from spurious matches for fragmented genes. Fragmentation is common among K. pneumoniae K loci, particularly when the genomes were sequenced using the Illumina technology with the Illumina XT library prep (see FAQs for more details).
Untypeable loci¶
These are loci that do not meet the above criteria. We recommend that users do not accept these results unless they are able to perform manual exploration of the data.
Problems¶
?= the match was in a multiple pieces, possibly due to a poor match or discontiguous assembly. The number of pieces is indicated by the integer directly following the?symbol).-= genes expected in the locus were not found.+= extra genes were found in the locus.*= one or more expected genes was found but with translated identity below the minimum threshold.!= one or more genes was found but truncated.
Exploring the other columns¶
Many users will not want or need to look beyond the columns described
above. However, the rest of the Kaptive output may be useful for those
wishing to investigate loci marked with Problems or explore locus
variation in more detail. Interesting features include:
- Missing genes may indicate a novel locus or deletion variant is present in the assembly
- Length discrepancies can indicate a novel locus or deletion variant is present in the assembly. For Klebsiella K loci, positive length discrepancies of >700bp often indicate insertion sequence insertions resulting in so called 'IS variants' of the locus.
- Other genes inside the locus may indicate a novel locus (with some exceptions, see the FAQs)
- Truncated genes may have an impact on the resultant phenotype. Kaptive will consider truncations when reporting predicting phenotypes, but it currently considers only gene truncations for which there is good supporting evidence in the literature, and such evidence is very limited.
See the tutorials for our tips on investigating loci in more detail outside of Kaptive.