Ed Doe
Ed Doe

Reputation: 45

R annotate phylogenetic tree with external data?

I'm using the ggtree package from Bioconductor to plot two phylogenetic trees. It works essentially like ggplot2, and I want to modify the aesthetics of the tip labels to match classes set by an external CSV file.

I have a multiPhylo object that contains two different clusterings of the same 50 genes (we'll pretend there are only 6 for this example). When I evaluate multitree[[1]]$tip.label and multitree[[2]]$tip.label they both give me the same list in the same order, so I know that while the plots are displayed differently, the genes are still stored in the same order.

library(ggtree)
library(ape)

mat <- as.dist(matrix(data = rexp(200, rate = 10), nrow = 6, ncol = 6))
nj.tree <- nj(mat)  ### Package ape
hclust.tree <- as.phylo(hclust(mat))
multitree <- c(nj.tree, hclust.tree)

I want to plot these trees and then annotate them with external data based on which of 5 classes (A, B, C, D, and E) they are according to existing literature.

write.csv(multitree[[1]]$tip.label, "Genes.csv")

I used this command to create a CSV file of each of the genes in the right order (not sure if that's relevant). I then manually entered the corresponding class letter in the column adjascent to each gene. It looks something like this:

Gene    Class
1       A
2       A
3       D
4       C
5       B
6       E

And so on.

I want to annotate the tip labels colors on my tree to correspond to the colors defined in my external CSV table. I know it would look something like geom_tiplab(aes(color=something something something)), but I don't know how to make it so that it reads the data inside my CSV and not the data within the multitree. Here's what my ggtree command looks like

myTree <- ggtree(multitree[[i]], aes(x, y)) + 
    ggtitle(names(multitree)[i]) + 
    geom_tiplab() +   ### What I want to annotate with color
    theme_tree2() + 
    coord_fixed(ratio = 0.5) 
print(myTree)         ###Occurs within a for loop, forces ggplot output to display

Upvotes: 1

Views: 1119

Answers (1)

nya
nya

Reputation: 2250

Create a color vector for the class names from your table.

g <- read.csv("Genes.csv")
cols <- rainbow(nlevels(g$Class))

# Function to identify class color for a certain gene 
findCol <- function(x){
    col <- switch(as.character(x), A=cols[1], B=cols[2], C=cols[3], D=cols[4], E=cols[5])
    return(col)
}
col.vect <- sapply(g$Class, findCol)

Use this vector in your geom_tiplab() function.

Upvotes: 1

Related Questions