Reputation: 45
I'm using the ggtree
package from Bioconductor to plot two phylogenetic trees. It works essentially like ggplot2, and I want to modify the aesthetics of the tip labels to match classes set by an external CSV file.
I have a multiPhylo object that contains two different clusterings of the same 50 genes (we'll pretend there are only 6 for this example). When I evaluate multitree[[1]]$tip.label
and multitree[[2]]$tip.label
they both give me the same list in the same order, so I know that while the plots are displayed differently, the genes are still stored in the same order.
library(ggtree)
library(ape)
mat <- as.dist(matrix(data = rexp(200, rate = 10), nrow = 6, ncol = 6))
nj.tree <- nj(mat) ### Package ape
hclust.tree <- as.phylo(hclust(mat))
multitree <- c(nj.tree, hclust.tree)
I want to plot these trees and then annotate them with external data based on which of 5 classes (A, B, C, D, and E) they are according to existing literature.
write.csv(multitree[[1]]$tip.label, "Genes.csv")
I used this command to create a CSV file of each of the genes in the right order (not sure if that's relevant). I then manually entered the corresponding class letter in the column adjascent to each gene. It looks something like this:
Gene Class
1 A
2 A
3 D
4 C
5 B
6 E
And so on.
I want to annotate the tip labels colors on my tree to correspond to the colors defined in my external CSV table. I know it would look something like geom_tiplab(aes(color=something something something))
, but I don't know how to make it so that it reads the data inside my CSV and not the data within the multitree
. Here's what my ggtree command looks like
myTree <- ggtree(multitree[[i]], aes(x, y)) +
ggtitle(names(multitree)[i]) +
geom_tiplab() + ### What I want to annotate with color
theme_tree2() +
coord_fixed(ratio = 0.5)
print(myTree) ###Occurs within a for loop, forces ggplot output to display
Upvotes: 1
Views: 1119
Reputation: 2250
Create a color vector for the class names from your table.
g <- read.csv("Genes.csv")
cols <- rainbow(nlevels(g$Class))
# Function to identify class color for a certain gene
findCol <- function(x){
col <- switch(as.character(x), A=cols[1], B=cols[2], C=cols[3], D=cols[4], E=cols[5])
return(col)
}
col.vect <- sapply(g$Class, findCol)
Use this vector in your geom_tiplab()
function.
Upvotes: 1