| Whole Genome Alignments |
This folder contains two subfolders: "Representative Dataset" and "Clonal Group Dataset". "Representative Dataset": Contains whole genome alignments for representative genomes of species with ≥10 representative strains. The files are named as “Species_taxid.whole_genome_aln.fasta”. "Clonal Group Dataset": Contains whole genome alignments for clonal groups with ≥10 strains. The files are named as “Species_taxid.CGid.whole_genome_aln.fasta”. Each genome is named by its "biosample-accession". Data in both datasets are compressed in 7z format. |
https://zenodo.org/records/14880463 |
| SNP Matrix |
This folder also contains two subfolders: "Representative Dataset" and "Clonal Group Dataset". These subfolders include core-genome SNP matrix files corresponding to the whole genome alignments in the datasets mentioned above. The files are named as “Species_taxid.snp.matrix” (Representative Dataset) or “Species_taxid.CGid.snp.matrix” (Clonal Group Dataset). In each SNP matrix, rows represent SNP positions in the reference genome, and columns represent the strains. Data are compressed in 7z format. |
https://zenodo.org/records/14880463 |
| Pangenome |
This folder contains the presence/absence matrix of pan-genes (generated by Panaroo analysis) for species with ≥10 representative strains. In each matrix, rows represent pan-genes, and columns represent the representative strains. Gene presence is indicated by “1”, and gene absence by “0”. The files are named as “Species_taxid.gene_presence_absence.Rtab”. Data are compressed in gzip format. |
https://zenodo.org/records/14880463 |