ira
ira

Reputation: 2644

Function to color branches in dendrogram plot using base R

I would like to write R function for coloring branches in dendrogram based on the given dendrogram object, specified number of clusters and vector of colors. I want to use base R instead of dendextend.

Using the exact code from this answer: https://stackoverflow.com/a/18036096/7064628 to similar question:

# Generate data
set.seed(12345)
desc.1 <- c(rnorm(10, 0, 1), rnorm(20, 10, 4))
desc.2 <- c(rnorm(5, 20, .5), rnorm(5, 5, 1.5), rnorm(20, 10, 2))
desc.3 <- c(rnorm(10, 3, .1), rnorm(15, 6, .2), rnorm(5, 5, .3))

data <- cbind(desc.1, desc.2, desc.3)

# Create dendrogram
d <- dist(data) 
hc <- as.dendrogram(hclust(d))

# Function to color branches
colbranches <- function(n, col)
  {
  a <- attributes(n) # Find the attributes of current node
  # Color edges with requested color
  attr(n, "edgePar") <- c(a$edgePar, list(col=col, lwd=2))
  n # Don't forget to return the node!
  }

# Color the first sub-branch of the first branch in red,
# the second sub-branch in orange and the second branch in blue
hc[[1]][[1]] = dendrapply(hc[[1]][[1]], colbranches, "red")
hc[[1]][[2]] = dendrapply(hc[[1]][[2]], colbranches, "orange")
hc[[2]] = dendrapply(hc[[2]], colbranches, "blue")

# Plot
plot(hc)

In the code above, you have to manually select the branches to recolor them. I would like to have a function which finds k highest branches and changes color for them (and all their sub-branches). So far I experimented with iteratively searching for the highest sub-branch, but it seems to be needlessly difficult. If there was a way to extract heights of all branches, find k highest, and change the edgePar for each of their sub-branches, would be awesome.

Upvotes: 0

Views: 636

Answers (2)

Kevin Bumgartner
Kevin Bumgartner

Reputation: 1

Here is a function I wrote to make this sort of processing of dendrograms more straight forward in base R. Note that you can subset a dendrogram as a nested list by either repetitive subsetting, or by vector subsetting. So dend[[1]][[1]][[2]] is equivalent to dend[[c(1,1,2)]].

So, my solution here is to make a list of all possible vector indices for the dendrogram. You can then cycle through the elements of that list, so in this example I'm using that list with a vapply() to provide a vector of the heights of all subdendrograms including leaves.

In the code below that function, I get the vector indices of the k highest branches (as you requested). Then I use your colbranches() function with dendrapply() to color the seven highest branches with seven colors.

dend_indices <- function(dend, leaves_only=TRUE) {
    prev_layer <- lapply(1:length(dend), function(x) x)
    next_layer <- Reduce(c, lapply(prev_layer, function(i) {
        if (length(dend[[i]])>1) {
            lapply(1:length(dend[[i]]), function(j) c(i, j))
        }else{
            list(i)
        }
    }))
    layers <- unique(c(prev_layer, next_layer))
    while(!identical(prev_layer, next_layer)) {
        prev_layer <- data.table::copy(next_layer)
        next_layer <- Reduce(c, lapply(prev_layer, function(i) {
            if (length(dend[[i]])>1) {
                lapply(1:length(dend[[i]]), function(j) c(i, j))
            }else{
                list(i)
            }
        }))
        layers <- unique(c(layers, next_layer))
    }
    if (leaves_only) next_layer else layers
}

all_indices <- dend_indices(dend, leaves_only=FALSE)
heights <- vapply(all_indices, function(index) attr(dend[[index]], "height"), FUN.VALUE=3.3)
ordered_heights <- unique(heights[order(-heights)])
depths <- vapply(all_indices, function(index) length(index), FUN.VALUE=3L)

k <- 7
good_depths <- unique(depths)[vapply(unique(depths), function(depth) {
    sum(depths==depth)>k
}, FUN.VALUE=TRUE)]
i <- 1
height <- ordered_heights[i]
while (sum(heights>height & depths %in% good_depths)<k) {
    height <- ordered_heights[i <- i + 1]
}
indices <- all_indices[heights>=height & depths %in% good_depths]
colors <- c("blue", "yellow", "orange", "green", "brown", "grey", "purple")
for (i in 1:k) {
    index <- indices[[i]]
    dend[[index]] <- dendrapply(dend[[index]], colbranches, colors[i])
}

Please keep in mind this code will color the highest 7 branches - but suppose there are 8 branches of equal height. In that case, you'll get two branches colored with the same color (unless you provide 8 colors).

Upvotes: 0

Tal Galili
Tal Galili

Reputation: 25376

the dendextend R package is designed for these tasks. You can see the many options for changing a dendrogram branch color in the vignette.

For example:

par(mfrow = c(1,2))
dend <- USArrests %>% dist %>% hclust(method = "ave") %>% as.dendrogram
d1=color_branches(dend,k=5, col = c(3,1,1,4,1))
plot(d1) # selective coloring of branches :)
d2=color_branches(d1,5)
plot(d2) 

enter image description here

Upvotes: 1

Related Questions