Marco Gamba
Marco Gamba

Reputation: 41

MCMCglmm questions: multiple species and ultrametric trees

These questions are related to my other question at Phylogenetic model using multiple entries for each species Thanks to @thomas-guillerme, I was able to start running an MCMCglmm model.

Although I had no problem running some of my example files in which I had a single entry for each of the species in my tree, I found an error message when trying to run my original dataset, which consists of thousands of entries for each of the species in my tree. When running:

comp_data <- comparative.data(phy = my_tree, data =my_data, names.col = species, vcv = TRUE)’  

I got an error:

'Error in row.names<-.data.frame(tmp, value = value) : duplicate 'row.names' are not allowed In addition: Warning message: non-unique values when setting 'row.names': ‘Species1’, ‘Species2’, ‘Species3’, ‘Species4’,...

I was surprised because I am using MCMCglmm and not PGLS because of the chance of using multiple entries for each species.

I tried the workaround of make the species name unique but in that case only the first entry of each species is recognized later in the model (because it corresponds with the name in my_tree).

Moreover, I had problems with having my tree recognized as ultrametric. I checked it using

'is.ultrametric(my_tree)'

Got:

FALSE

I tried:

    function (phy) { if(any(is.ultrametric(my_tree)) == FALSE) { my_tree <- lapply(my_tree, chronoMPL) class(my_tree) <- "Phylo" 
 }
}

But these lines apparently do not solve the problem. Thanks in advance for your help.

Upvotes: 0

Views: 626

Answers (1)

Thomas Guillerme
Thomas Guillerme

Reputation: 1857

Hard to tell without a running example but for the second question at least, it seems that the bug comes from the phy argument not being passed to the function at all (it's using my_tree

check.fun <- function(my_tree) {
    if(any(is.ultrametric(my_tree)) == FALSE) {
         my_tree <- lapply(my_tree, chronoMPL)
         class(my_tree) <- "Phylo" 
    }
    return(my_tree)
}

For the first point, you might want to try to run it through the mulTree package that does a lot of housekeeping:

## Loading/installing the package
library(devtools)
install_github("TGuillerme/mulTree")
library(mulTree)

## Loading the example data
data(lifespan)
## Randomly combining trees
combined_trees <- tree.bind(x = trees_mammalia, y = trees_aves, sample = 2,
                            root.age = 250)

We can then generate an example with multiple specimens per species:

## Subset of the data
data <- lifespan_volant[sample(nrow(lifespan_volant), 30),]
## Create a dataset with two specimen per species
data <- rbind(cbind(data, specimen = rep("spec1", 30)), cbind(data,
             specimen = rep("spec2", 30)))

Note that the first column contains the list of species with multiple specimens per species (specified in column $specimen)

head(data[order(data$species),])
#                 species    class longevity      mass    volant specimen
#16   Addax_nasomaculatus Mammalia 0.8413927 1.8227058 nonvolant    spec1
#161  Addax_nasomaculatus Mammalia 0.8413927 1.8227058 nonvolant    spec2
#140          Anser_anser     Aves 0.9929849 0.5993055    volant    spec1
#1401         Anser_anser     Aves 0.9929849 0.5993055    volant    spec2
#21   Antilope_cervicapra Mammalia 0.6055864 1.4910746 nonvolant    spec1
#211  Antilope_cervicapra Mammalia 0.6055864 1.4910746 nonvolant    spec2

You can then use the clean.data function to make sure the trees match the dataset (specifying which column contains the species names)

## Making sure both the trees and the data match
cleaned_data <- clean.data(data, combined_trees, data.col = "species")

## Only using the cleaned version
trees <- cleaned_data$tree
data <- cleaned_data$data

You can find the eventual dropped tips/rows in cleaned_data$dropped_tips and cleaned_data$dropped_rows:

## Creates a mulTree object specifying species AND specimens as random terms
mulTree_data <- as.mulTree(data, trees, taxa = "species",
                        rand.terms = ~species+specimen)

## formula to test
test_formula <- longevity ~ mass + volant
## MCMC parameters (number of generations, thin/sampling, burnin)
mcmc_parameters <- c(101000, 10, 1000)
## priors
mcmc_priors <- list(R = list(V = 1/2, nu = 0.002),
                    G = list(G1 = list(V = 1/2, nu = 0.002)))

## Running MCMCglmm on multiple trees
mulTree(mulTree_data, formula = test_formula, parameters = mcmc_parameters,
     priors = mcmc_priors, output = "longevity.example", ESS = 50)

To analyse the resulting files, you can use read.mulTree and subsequent functions (see the mulTree manual).

Upvotes: 1

Related Questions