Reputation: 2644
I would like to write R function
for coloring branches in dendrogram based on the given dendrogram object, specified number of clusters and vector of colors. I want to use base R
instead of dendextend
.
Using the exact code from this answer: https://stackoverflow.com/a/18036096/7064628 to similar question:
# Generate data
set.seed(12345)
desc.1 <- c(rnorm(10, 0, 1), rnorm(20, 10, 4))
desc.2 <- c(rnorm(5, 20, .5), rnorm(5, 5, 1.5), rnorm(20, 10, 2))
desc.3 <- c(rnorm(10, 3, .1), rnorm(15, 6, .2), rnorm(5, 5, .3))
data <- cbind(desc.1, desc.2, desc.3)
# Create dendrogram
d <- dist(data)
hc <- as.dendrogram(hclust(d))
# Function to color branches
colbranches <- function(n, col)
{
a <- attributes(n) # Find the attributes of current node
# Color edges with requested color
attr(n, "edgePar") <- c(a$edgePar, list(col=col, lwd=2))
n # Don't forget to return the node!
}
# Color the first sub-branch of the first branch in red,
# the second sub-branch in orange and the second branch in blue
hc[[1]][[1]] = dendrapply(hc[[1]][[1]], colbranches, "red")
hc[[1]][[2]] = dendrapply(hc[[1]][[2]], colbranches, "orange")
hc[[2]] = dendrapply(hc[[2]], colbranches, "blue")
# Plot
plot(hc)
In the code above, you have to manually select the branches to recolor them. I would like to have a function which finds k
highest branches and changes color for them (and all their sub-branches). So far I experimented with iteratively searching for the highest sub-branch, but it seems to be needlessly difficult. If there was a way to extract heights of all branches, find k
highest, and change the edgePar
for each of their sub-branches, would be awesome.
Upvotes: 0
Views: 636
Reputation: 1
Here is a function I wrote to make this sort of processing of dendrograms more straight forward in base R. Note that you can subset a dendrogram as a nested list by either repetitive subsetting, or by vector subsetting. So dend[[1]][[1]][[2]] is equivalent to dend[[c(1,1,2)]].
So, my solution here is to make a list of all possible vector indices for the dendrogram. You can then cycle through the elements of that list, so in this example I'm using that list with a vapply() to provide a vector of the heights of all subdendrograms including leaves.
In the code below that function, I get the vector indices of the k highest branches (as you requested). Then I use your colbranches() function with dendrapply() to color the seven highest branches with seven colors.
dend_indices <- function(dend, leaves_only=TRUE) {
prev_layer <- lapply(1:length(dend), function(x) x)
next_layer <- Reduce(c, lapply(prev_layer, function(i) {
if (length(dend[[i]])>1) {
lapply(1:length(dend[[i]]), function(j) c(i, j))
}else{
list(i)
}
}))
layers <- unique(c(prev_layer, next_layer))
while(!identical(prev_layer, next_layer)) {
prev_layer <- data.table::copy(next_layer)
next_layer <- Reduce(c, lapply(prev_layer, function(i) {
if (length(dend[[i]])>1) {
lapply(1:length(dend[[i]]), function(j) c(i, j))
}else{
list(i)
}
}))
layers <- unique(c(layers, next_layer))
}
if (leaves_only) next_layer else layers
}
all_indices <- dend_indices(dend, leaves_only=FALSE)
heights <- vapply(all_indices, function(index) attr(dend[[index]], "height"), FUN.VALUE=3.3)
ordered_heights <- unique(heights[order(-heights)])
depths <- vapply(all_indices, function(index) length(index), FUN.VALUE=3L)
k <- 7
good_depths <- unique(depths)[vapply(unique(depths), function(depth) {
sum(depths==depth)>k
}, FUN.VALUE=TRUE)]
i <- 1
height <- ordered_heights[i]
while (sum(heights>height & depths %in% good_depths)<k) {
height <- ordered_heights[i <- i + 1]
}
indices <- all_indices[heights>=height & depths %in% good_depths]
colors <- c("blue", "yellow", "orange", "green", "brown", "grey", "purple")
for (i in 1:k) {
index <- indices[[i]]
dend[[index]] <- dendrapply(dend[[index]], colbranches, colors[i])
}
Please keep in mind this code will color the highest 7 branches - but suppose there are 8 branches of equal height. In that case, you'll get two branches colored with the same color (unless you provide 8 colors).
Upvotes: 0
Reputation: 25376
the dendextend R package is designed for these tasks. You can see the many options for changing a dendrogram branch color in the vignette.
For example:
par(mfrow = c(1,2))
dend <- USArrests %>% dist %>% hclust(method = "ave") %>% as.dendrogram
d1=color_branches(dend,k=5, col = c(3,1,1,4,1))
plot(d1) # selective coloring of branches :)
d2=color_branches(d1,5)
plot(d2)
Upvotes: 1