Geomicro
Geomicro

Reputation: 474

tip labels and ASV names across a phylo, otu table, and tax table are not matching

I am using match.phylo.data() in picante to match an otu table, taxonomy table, and tree tip labels in R. I am able to save the output as a list with no errors or warnings (typically I get a warning for dropped tips if there are any) but when I use the output for diversity metrics, I only get warnings and errors that no tip names match.

library(picante)
match.phylo.otu = match.phylo.data(tree, otu)
PD <- pd(samp = match.phylo.otu$data,tree = match.phylo.otu$phy,
              include.root = FALSE)

But after trying to calculate faith's PD, I get this output and a PD column as all 0's and nulls

    There were 50 or more warnings (use warnings() to see the first 50)
...
    50: In drop.tip.phylo(tree, treeabsent) :
      drop all tips of the tree: returning NULL

I've manually re-created the tree twice, once using R sequinr and again using plain-old unix, both outputting the problem above. I can also manually copy and paste tip labels across datasets and find corresponding information.

Here I've generated subset of each dataset. (the fact I can subset by tip labels is contrary to the problem I'm having in match.phylo.data).

ex.otu <- otu[1:5,1:5]
ex.tax <- tax[rownames(tax) %in% rownames(ex.otu),]
library(castor)
ex.tree <- get_subtree_with_tips(tree,only_tips = rownames(ex.otu))
ex.tree <- ex.tree$subtree

> dput(ex.otu)
structure(list(NASQAN2015.147.348 = c(0L, 87L, 0L, 105L, 0L), 
    NASQAN2015.148.348 = c(0L, 57L, 0L, 23L, 21L), NASQAN2015.161.348 = c(17L, 
    77L, 0L, 146L, 0L), NASQAN2015.162.348 = c(0L, 38L, 0L, 95L, 
    0L), NASQAN2015.163.348 = c(0L, 39L, 0L, 7L, 0L)), row.names = c("ee866b92e722c35819b112aadc4ac885", 
"afc0eeec83a181be740331928d883362", "294c12d6881a2ed67aa1557cda9889ff", 
"466c2c4cb06ba39543c40a74c027008a", "fd0b270adb11f2781450c2e057e50f07"
), class = "data.frame")
> dput(ex.tax)
structure(list(Kingdom = c("d__Bacteria", "d__Bacteria", "d__Bacteria", 
"d__Bacteria", "d__Bacteria"), Phylum = c("p__Chloroflexi", "p__Actinobacteriota", 
"p__Actinobacteriota", "p__Planctomycetota", "p__Cyanobacteria"
), Class = c("c__Anaerolineae", "c__Actinobacteria", "c__Acidimicrobiia", 
"c__Planctomycetes", "c__Cyanobacteriia"), Order = c("o__Anaerolineales", 
"o__Frankiales", "o__Microtrichales", "o__Planctomycetales", 
"o__Chloroplast"), Family = c("f__Anaerolineaceae", "f__Sporichthyaceae", 
"f__Ilumatobacteraceae", "f__Rubinisphaeraceae", "f__Chloroplast"
), Genus = c("g__uncultured", "g__Candidatus_Planktophila", "g__CL500-29_marine_group", 
"g__uncultured", "g__Chloroplast"), Species = c("s__unclassified_g__uncultured", 
"s__unclassified_g__Candidatus_Planktophila", "s__bacterium_enrichment", 
"s__unclassified_g__uncultured", "s__Guillardia_theta")), row.names = c("294c12d6881a2ed67aa1557cda9889ff", 
"466c2c4cb06ba39543c40a74c027008a", "afc0eeec83a181be740331928d883362", 
"ee866b92e722c35819b112aadc4ac885", "fd0b270adb11f2781450c2e057e50f07"
), class = "data.frame")
> dput(ex.tree)
structure(list(Nnode = 4, tip.label = c("afc0eeec83a181be740331928d883362", 
"466c2c4cb06ba39543c40a74c027008a", "fd0b270adb11f2781450c2e057e50f07", 
"294c12d6881a2ed67aa1557cda9889ff", "ee866b92e722c35819b112aadc4ac885"
), node.label = c("0.954", "0.678", "0.813", "0.942"), edge = structure(c(6L, 
6L, 7L, 7L, 8L, 8L, 9L, 9L, 5L, 7L, 8L, 9L, 4L, 3L, 2L, 1L), dim = c(8L, 
2L)), edge.length = c(0.328657635, 0.016960962, 0.053311576, 
0.123171583, 0.177567591, 0.273360134, 0.177939166, 0.119027034
), root = 6, root.edge = 0.020729803), class = "phylo")

update: I can even make a "new" tree by subsetting my original tree with the rownames of otu. This "new" tree is exactly the same as my original tree, but when used in match.phylo.data() and pd() the same error occurs.

Upvotes: 1

Views: 66

Answers (1)

Geomicro
Geomicro

Reputation: 474

Heyo!

Turns out I needed to transpose the match.phylo.otu$data object. Below is an example that'd calculate faith's pd.

riverPD <- pd(samp = t(match.phylo.otu$data),tree =match.phylo.otu$phy,
              include.root = TRUE)

Upvotes: 0

Related Questions