Both long and short-term epidemiology are fundamental to disease control and require accurate typing of bacterial isolates. The implementation of whole genome sequencing in many public health laboratories has led to an explosion of genomic data that has the potential to provide highly sensitive and accurate descriptions of strain relatedness. Previous efforts to implement typing regimes using this data have mainly focussed on outbreak detection or used clustering methods to identify larger groups of isolates.
We have developed multilevel genome typing (MGT), using multiple, hierarchical multilocus sequence typing (MLST) schemes of increasing size that allow examination of genetic relatedness at resolutions from 7 gene MLST to core genome MLST. This system avoids clustering methods which can provide unstable naming schemes and derives each identifier directly from the sequence. Once assigned, the string of sequence types from each scheme, known as a genome type (GT) will not change. We implemented this system for Salmonella enterica serovar Typhimurium and typed 9799 isolates with publicly available data. Previously described S. Typhimurium populations can be identified and named, such as the DT104 multidrug resistance lineage (GT 19-2-11) and two invasive lineages of African isolates (GT 313-2-3 and 313-2-752). Further we show that outbreak detection clusters, derived from the MGT, are capable of accurately distinguishing 54 outbreak isolates from 5 background isolates in five outbreaks.
MGT provides a universal and stable nomenclature at multiple resolutions for bacterial strains and could be implemented as an internationally standardisable strain identification system that accommodates both long-term and short-term epidemiological needs. It will allow better temporal and spatial tracking of bacterial clones that are associated with clinically relevant phenotypes such as antimicrobial resistance and disease severity, facilitating better diagnosis, clinical management, disease control and prevention.