goshawk
goshawk

Reputation: 97

How to rotate nodes of a time-calibrated phylogenetic tree to match a particular order in R?

I have a time-calibrated phylogenetic tree from BEAST and I would like to make a figure in which its nodes are rotated to match an arbitrary ordering. The following code works perfectly to plot the tree with the nodes in the order they are in the input file.

library("phytools")
library("phyloch")
library("strap")
library("coda")

t <- read.beast("mcctree.tre") # I couldn't upload the file here

t$root.time <- t$height[1]

num_taxa <- length(t$tip.label)

display_all_node_bars <- TRUE

names_list <-vector()
for (name in t$tip){
  v <- strsplit(name, "_")[[1]]
  if(display_all_node_bars){
    names_list = c(names_list, name)
  }
  else if(v[length(v)]=="0"){
    names_list = c(names_list, name)
  }
}

nids <- vector()
pos <- 1
len_nl <- length(names_list)
for(n in names_list){
  for(nn in names_list[pos:len_nl]){
    if(n != nn){
      m <- getMRCA(t,c(n, nn))
      if(m %in% nids == FALSE){
        nids <- c(nids, m)
      }
    }
  }
  pos <- pos+1
}


pdf("tree.pdf", width = 20, height = 20)

geoscalePhylo(tree = t,
              x.lim = c(-2,21),
              units = c("Epoch"),
              tick.scale = "myr",
              boxes = FALSE,
              width = 1,
              cex.tip = 2,
              cex.age = 3,
              cex.ts = 2,
              erotate = 0,
              label.offset = 0.1)

lastPP <- get("last_plot.phylo", envir = .PlotPhyloEnv)

for(nv in nids){
  bar_xx_a <- c(lastPP$xx[nv]+t$height[nv-num_taxa]-t$"height_95%_HPD_MIN"[nv-num_taxa],
                lastPP$xx[nv]-(t$"height_95%_HPD_MAX"[nv-num_taxa]-t$height[nv-num_taxa]))
  lines(bar_xx_a, c(lastPP$yy[nv], lastPP$yy[nv]), col = rgb(0, 0, 1, alpha = 0.3), lwd = 12)
}

t$node.label <- t$posterior
p <- character(length(t$node.label))
p[t$node.label >= 0.95] <- "black"
p[t$node.label < 0.95 & t$node.label >= 0.75] <- "gray"
p[t$node.label < 0.75] <- "white"
nodelabels(pch = 21, cex = 1.5, bg = p)

dev.off()

enter image description here

The following code is my attempt to rotate the nodes in the way I want (following this tutorial: http://blog.phytools.org/2015/04/finding-closest-set-of-node-rotations.html). And it works for rotating the nodes. However, the blue bars indicating the confidence intervals of the divergence time estimates get out of their correct place - this is what I would like help to correct. This will be used in much larger files with hundreds of branches - the example here is simplified.

new.order <- c("Sp8","Sp9","Sp10","Sp7","Sp6","Sp5","Sp4","Sp2","Sp3","Ou1","Ou2","Sp1")

t2 <- setNames(1:Ntip(t), new.order)

new.order.tree <- minRotate(t, t2)

new.order.tree$root.time <- t$root.time
new.order.tree$height <- t$height
new.order.tree$"height_95%_HPD_MIN" <- t$"height_95%_HPD_MIN"
new.order.tree$"height_95%_HPD_MAX" <- t$"height_95%_HPD_MAX"

pdf("reordered_tree.pdf", width = 20, height = 20)

geoscalePhylo(tree = new.order.tree,
              x.lim = c(-2,21),
              units = c("Epoch"),
              tick.scale = "myr",
              boxes = FALSE,
              width = 1,
              cex.tip = 2,
              cex.age = 3,
              cex.ts = 2,
              erotate = 0,
              label.offset = 0.1)

lastPP <- get("last_plot.phylo", envir = .PlotPhyloEnv)

for(nv in nids){
  bar_xx_a <- c(lastPP$xx[nv]+new.order.tree$height[nv-num_taxa]-new.order.tree$"height_95%_HPD_MIN"[nv-num_taxa],
                lastPP$xx[nv]-(new.order.tree$"height_95%_HPD_MAX"[nv-num_taxa]-new.order.tree$height[nv-num_taxa]))
  lines(bar_xx_a, c(lastPP$yy[nv], lastPP$yy[nv]), col = rgb(0, 0, 1, alpha = 0.3), lwd = 12)
}

new.order.tree$node.label <- t$posterior
p <- character(length(new.order.tree$node.label))
p[new.order.tree$node.label >= 0.95] <- "black"
p[new.order.tree$node.label < 0.95 & new.order.tree$node.label >= 0.75] <- "gray"
p[new.order.tree$node.label < 0.75] <- "white"
nodelabels(pch = 21, cex = 1.5, bg = p)

dev.off()

enter image description here

I've found several similar questions here and in other forums, but none dealing specifically with time-calibrated trees - which is the core of the problem described above.

Upvotes: 1

Views: 428

Answers (1)

Martin Smith
Martin Smith

Reputation: 4087

The short answer is that phyTools::minRotate() doesn't recognize the confidence intervals as associated with nodes. If you contact the phyTools maintainers, they may well be able to add this functionality quite easily.

Meanwhile, you can correct this yourself. I don't know how read.beast() saves confidence intervals – let's say they're saved in t$conf.int. (Type unclass(t) at the R command line to see the full structure; you should be able to identify the appropriate property.)

If the tree's node labels are unique, then you can infer the new sequence of nodes using match():

library("phytools")
new.order <- c("Sp8","Sp9","Sp10","Sp7","Sp6","Sp5","Sp4","Sp2","Sp3","Ou1","Ou2","Sp1")

# Set up a fake initial tree -- you would load the tree from a file
tree <- rtree(length(new.order))
tree$tip.label <- sort(new.order)
tree$node.label <- seq_len(tree$Nnode)
tree$conf.int <- seq_len(tree$Nnode) * 10

# Plot tree
par(mfrow = c(1, 2), mar = rep(0, 4), cex = 0.9) # Create space
plot(tree, show.node.label = TRUE)
nodelabels(tree$conf.int, adj = 1) # Annotate "correct" intervals

# Re-order nodes with minRotate
noTree <- minRotate(tree, setNames(seq_along(new.order), new.order))
plot(noTree, show.node.label = TRUE)

# Move confidence intervals to correct node
tree$conf.int <- tree$conf.int[match(noTree$node.label, tree$node.label)]
nodelabels(tree$conf.int, adj = 1)

If you can't guarantee that the node labels are unique, you can always overwrite them in a temporary object:

# Find node order
treeCopy <- tree
treeCopy$node.label <- seq_len(tree$Nnode)
nodeOrder <- match(minRotate(treeCopy)$node.label, treeCopy$node.label)

# Apply node order
tree$conf.int <- tree$conf.int[nodeOrder]

Upvotes: 0

Related Questions