Reputation: 41
These questions are related to my other question at Phylogenetic model using multiple entries for each species Thanks to @thomas-guillerme, I was able to start running an MCMCglmm model.
Although I had no problem running some of my example files in which I had a single entry for each of the species in my tree, I found an error message when trying to run my original dataset, which consists of thousands of entries for each of the species in my tree. When running:
comp_data <- comparative.data(phy = my_tree, data =my_data, names.col = species, vcv = TRUE)’
I got an error:
'Error in row.names<-.data.frame(tmp, value = value) : duplicate 'row.names' are not allowed In addition: Warning message: non-unique values when setting 'row.names': ‘Species1’, ‘Species2’, ‘Species3’, ‘Species4’,...
I was surprised because I am using MCMCglmm and not PGLS because of the chance of using multiple entries for each species.
I tried the workaround of make the species name unique but in that case only the first entry of each species is recognized later in the model (because it corresponds with the name in my_tree).
Moreover, I had problems with having my tree recognized as ultrametric. I checked it using
'is.ultrametric(my_tree)'
Got:
FALSE
I tried:
function (phy) { if(any(is.ultrametric(my_tree)) == FALSE) { my_tree <- lapply(my_tree, chronoMPL) class(my_tree) <- "Phylo"
}
}
But these lines apparently do not solve the problem. Thanks in advance for your help.
Upvotes: 0
Views: 626
Reputation: 1857
Hard to tell without a running example but for the second question at least, it seems that the bug comes from the phy
argument not being passed to the function at all (it's using my_tree
check.fun <- function(my_tree) {
if(any(is.ultrametric(my_tree)) == FALSE) {
my_tree <- lapply(my_tree, chronoMPL)
class(my_tree) <- "Phylo"
}
return(my_tree)
}
For the first point, you might want to try to run it through the mulTree
package that does a lot of housekeeping:
## Loading/installing the package
library(devtools)
install_github("TGuillerme/mulTree")
library(mulTree)
## Loading the example data
data(lifespan)
## Randomly combining trees
combined_trees <- tree.bind(x = trees_mammalia, y = trees_aves, sample = 2,
root.age = 250)
We can then generate an example with multiple specimens per species:
## Subset of the data
data <- lifespan_volant[sample(nrow(lifespan_volant), 30),]
## Create a dataset with two specimen per species
data <- rbind(cbind(data, specimen = rep("spec1", 30)), cbind(data,
specimen = rep("spec2", 30)))
Note that the first column contains the list of species with multiple specimens per species (specified in column $specimen
)
head(data[order(data$species),])
# species class longevity mass volant specimen
#16 Addax_nasomaculatus Mammalia 0.8413927 1.8227058 nonvolant spec1
#161 Addax_nasomaculatus Mammalia 0.8413927 1.8227058 nonvolant spec2
#140 Anser_anser Aves 0.9929849 0.5993055 volant spec1
#1401 Anser_anser Aves 0.9929849 0.5993055 volant spec2
#21 Antilope_cervicapra Mammalia 0.6055864 1.4910746 nonvolant spec1
#211 Antilope_cervicapra Mammalia 0.6055864 1.4910746 nonvolant spec2
You can then use the clean.data
function to make sure the trees match the dataset (specifying which column contains the species names)
## Making sure both the trees and the data match
cleaned_data <- clean.data(data, combined_trees, data.col = "species")
## Only using the cleaned version
trees <- cleaned_data$tree
data <- cleaned_data$data
You can find the eventual dropped tips/rows in cleaned_data$dropped_tips
and cleaned_data$dropped_rows
:
## Creates a mulTree object specifying species AND specimens as random terms
mulTree_data <- as.mulTree(data, trees, taxa = "species",
rand.terms = ~species+specimen)
## formula to test
test_formula <- longevity ~ mass + volant
## MCMC parameters (number of generations, thin/sampling, burnin)
mcmc_parameters <- c(101000, 10, 1000)
## priors
mcmc_priors <- list(R = list(V = 1/2, nu = 0.002),
G = list(G1 = list(V = 1/2, nu = 0.002)))
## Running MCMCglmm on multiple trees
mulTree(mulTree_data, formula = test_formula, parameters = mcmc_parameters,
priors = mcmc_priors, output = "longevity.example", ESS = 50)
To analyse the resulting files, you can use read.mulTree
and subsequent functions (see the mulTree
manual).
Upvotes: 1