giac
giac

Reputation: 4299

R Dendrogram Parent-Child Clusters

I am trying to find out how I could retrieve which clusters are "children/descendent" to "parent" clusters. Let me illustrate this with the following plot.

This plot is a normal dendrogram with different clustering solutions. What I would like to draw is the path between the smaller clusters and the larger clusters. The reason I want to do this is that I have a very large dataset and I have complex clusters and I need to understand which clusters "descend" (small clusters) from the large clusters.

enter image description here

# Load data
data(USArrests)

# Compute distances and hierarchical clustering
dd <- dist(scale(USArrests), method = "euclidean")
hc <- hclust(dd, method = "ward.D2")

par(mfrow = c(2,2))
# Plot the obtained dendrogram
plot(hc, cex = 0.6, hang = -1)
rect.hclust(hc, k = 2, border = 2:5)

plot(hc, cex = 0.6, hang = -1)
rect.hclust(hc, k = 4, border = 2:5)

plot(hc, cex = 0.6, hang = -1)
rect.hclust(hc, k = 8, border = 2:5)

plot(hc, cex = 0.6, hang = -1)
rect.hclust(hc, k = 12, border = 2:5)

For instance, here I have two solutions: 2 clusters and 4 clusters. It is unclear to me how I can know which sub_grp2 clusters were divided from the 2 sub_grp1 clusters (and so on).

# Cut tree into 4 groups
sub_grp1 <- cutree(hc, k = 2)
sub_grp2 <- cutree(hc, k = 4)
sub_grp3 <- cutree(hc, k = 8)
sub_grp4 <- cutree(hc, k = 12)

USArrests$sub_grp1 = sub_grp1
USArrests$sub_grp2 = sub_grp2
USArrests$sub_grp3 = sub_grp3
USArrests$sub_grp4 = sub_grp4

What I really would like to draw, or retrieve in any way, is something like:

enter image description here

This would really help me know which of the smaller clusters "descend" from the larger ones.

Does that make sense?

Upvotes: 2

Views: 876

Answers (2)

G5W
G5W

Reputation: 37641

One solution would be to convert your dendrogram to an igraph graph and use the plotting tools available in igraph.

With all 50 states it is a little crowded, but you can see the tree structure.

## Convert to a phylo,  then to igraph
library(ape)
PH = as.phylo(hc)
IG = as.igraph(PH)

## Make a nice layout
LO = layout_as_tree(IG)
LO2 = LO[,2:1]
LO2[,1] = LO2[,1]*6

## plot
plot(IG, layout=LO2, vertex.size=80, edge.arrow.size=0.5,
    rescale=F, vertex.label.cex = 0.8,
    xlim=range(LO2[,1]), ylim=range(LO2[,2]))

Dendrogram of states

Upvotes: 2

StupidWolf
StupidWolf

Reputation: 46898

You can try the clustree package. The order might not be similar to that in the dendrogram, but you can see the relationship:

library(clustree)
data(USArrests)

# Compute distances and hierarchical clustering
dd <- dist(scale(USArrests), method = "euclidean")
hc <- hclust(dd, method = "ward.D2")

Ks = c(1,2,4,6,8)
clus_results = sapply(Ks,function(i){
cutree(hc,i)
})

colnames(clus_results) = paste0("K",Ks)
clustree(clus_results, prefix = "K")

enter image description here

Upvotes: 2

Related Questions