Shaminur
Shaminur

Reputation: 71

Is there any R function to extract all taxonomy name (phylum, class, order, family ...) from species taxonomic ID/ species name or genus name?

I have the taxonomic ID of species and I can get the species and genus name from NCBI (https://www.ncbi.nlm.nih.gov/Taxonomy/TaxIdentifier/tax_identifier.cgi). But I want phylum, class, order.. all from these data.

I have tried taxize package, but not working for a large dataset.

specieslist <- c("Clostridium", "Clostridium",  "Achromobacter",    "Achromobacter",    "Acinetobacter",    "Acinetobacter",    "Acinetobacter",    "Acinetobacter",    "Acinetobacter",    "Acinetobacter",    "Acinetobacter",    "Acinetobacter",    "Acinetobacter",    "Acinetobacter",    "Acinetobacter",    "Acinetobacter",    "Actinomyces",  "Actinomyces",  "Aeromonas",    "Agrococcus",   "Alcanivorax",  "Alkalihalobacillus",   "Alloprevotella",   "Aminobacterium",   "Amniculibacterium",    "Anaerocolumna",    "Anaerocolumna",    "Anaerocolumna",    "Asticcacaulis",    "Atopobium",    "Bacillus", "Bacillus", "Bacteroidales",    "Bacteroides",  "Bacteroides",  "Bacteroides",  "Bacteroides",  "Bacteroides",  "Bacteroides",  "Barnesiella",  "Bifidobacterium",  "Blochmannia",  "Bordetella",   "Brevibacillus",    "Buchnera", "Burkholderia", "Butyricimonas",    "Campylobacter",    "Campylobacter",    "Campylobacter",    "Campylobacter",    "Campylobacter",    "Campylobacter",    "Campylobacter",    "Campylobacter",    "Campylobacter",    "Campylobacter",    "Campylobacter",    "Campylobacter",    "Campylobacter",    "Capnocytophaga",   "Capnocytophaga",   "Capnocytophaga",   "Chroococcidiopsis",    "Citrobacter",  "Clostridium",  "Clostridium",  "Clostridium",  "Clostridium",  "Corynebacterium",  "Corynebacterium",  "Corynebacterium",  "Corynebacterium",  "Cutibacterium",    "Dialister",    "Dolosigranulum",   "Enterobacter", "Enterococcus", "Entomoplasma", "Escherichia",  "Escherichia",  "Escherichia",  "Eubacterium",  "Fermentimonas",    "Frankia",  "Fusobacterium",    "Fusobacterium",    "Fusobacterium",    "Fusobacterium",    "Fusobacterium",    "Fusobacterium",    "Gemella",  "Haemophilus",  "Haemophilus",  "Halomonas",    "Hydrogenophaga",   "Ilyobacter",   "Klebsiella",   "Klebsiella",   "Klebsiella",   "Klebsiella",   "Klebsiella",   "Kocuria",  "Kytococcus",   "Lachnoanaerobaculum",  "Lachnospira",  "Lachnospiraceae",  "Lachnospiraceae",  "Lacrimispora", "Lactobacillus",    "Lactobacillus",    "Lactobacillus",    "Lactobacillus",    "Lactobacillus",    "Lancefieldella",   "Lautropia",    "Leptotrichia", "Leptotrichia", "Leptotrichia", "Leptotrichia", "Leptotrichia", "Leptotrichia", "Leptotrichia", "Leptotrichia", "Ligilactobacillus",    "Limosilactobacillus",  "Luteimonas",   "Lysinibacillus",   "Lysobacter",   "Lysobacter",   "Lysobacter",   "Magnetospirillum", "Marivirga",    "Megasphaera",  "Megasphaera",  "Meiothermus",  "Methylobacterium", "Methylobacterium", "Methylobacterium", "Methylobacterium", "Methylobacterium", "Microbacterium",   "Microbacterium",   "Microbacterium",   "Microbacterium",   "Micrococcus",  "Muribaculaceae",   "Muribaculum",  "Muribaculum",  "Neisseria",    "Neisseria",    "Neisseria",    "Neisseria",    "Neisseria",    "Neisseria",    "Neisseria",    "Nocardioides", "Nocardioides", "Paludibacter", "Pantoea",  "Paracoccus",   "Paracoccus",   "Paraprevotella",   "Pasteurella",  "Petrimonas",   "Phenylobacterium", "Phocaeicola",  "Phocaeicola",  "Phocaeicola",  "Phyllobacterium",  "Polaribacter", "Pontibacter",  "Pontibacter",  "Porphyromonas",    "Porphyromonas",    "Porphyromonas",    "Porphyromonas",    "Prevotella",   "Prevotella",   "Prevotella",   "Prevotella",   "Prevotella",   "Prevotella",   "Prevotella",   "Prevotella",   "Prevotella",   "Prevotella",   "Prevotella",   "Proteus",  "Pseudoleptotrichia",   "Pseudomonas",  "Pseudonocardia",   "Pseudonocardia",   "Raoultella",   "Rheinheimera", "Romboutsia",   "Roseivirga",   "Roseococcus",  "Rothia",   "Rothia",   "Rubrobacter",  "Rubrobacter",  "Rufibacter",   "Saccharomonospora",    "Saccharopolyspora",    "Saccharopolyspora",    "Salinivirga",  "Salmonella",   "Schaalia", "Sedimentisphaera", "Selenomonas",  "Selenomonas",  "Selenomonas",  "Selenomonas",  "Selenomonas",  "Selenomonas",  "Shigella", "Skermanella",  "Sphingosinicella", "Spirosoma",    "Staphylococcus",   "Staphylococcus",   "Stenotrophomonas", "Streptococcus",    "Streptococcus",    "Streptococcus",    "Streptococcus",    "Streptococcus",    "Streptococcus",    "Streptococcus",    "Streptococcus",    "Streptococcus",    "Streptococcus",    "Streptococcus",    "Streptococcus",    "Streptococcus",    "Streptomyces", "Tannerella",   "Tannerella",   "Thermovirga",  "Treponema",    "Treponema",    "Treponema",    "Treponema",    "Treponema",    "Treponema",    "Veillonella",  "Veillonella",  "Veillonella",  "Veillonella",  "Veillonella")


t <- tax_name(query = c(specieslist), get = c("phylum","class", "order", "family", "genus"), db = "ncbi")

Any suggestion please?

Upvotes: 5

Views: 2823

Answers (2)

sckott
sckott

Reputation: 5893

(taxize maintainer here)

Another option if taxize is too slow for you is taxizedb. By default uses NCBI as the data source. taxizedb is similar to taxize, but uses local database dumps instead of doing http requests; but you do have the initial setup time to download databases

install.packages("taxizedb")
library(taxizedb)
ids <- name2taxid(x, out_type="summary")
classification(ids$id)

Then you can pull out whatever ranks you want from each data.frame

Upvotes: 3

Shaminur
Shaminur

Reputation: 71

https://bioinf.shenwei.me/taxonkit/usage/#usage-and-examples

taxonkit lineage txtid.txt | tee lineage.txt

cat lineage.txt \
    | taxonkit reformat \
    | csvtk -H -t cut -f 1,3 \
    | csvtk -H -t sep -f 2 -s ';' -R \
    | csvtk add-header -t -n taxid,kindom,phylum,class,order,family,genus,species \
    | csvtk pretty -t

Upvotes: 0

Related Questions