Reputation: 4299
I am trying to find out how I could retrieve which clusters are "children/descendent" to "parent" clusters. Let me illustrate this with the following plot.
This plot is a normal dendrogram with different clustering solutions. What I would like to draw is the path between the smaller clusters and the larger clusters. The reason I want to do this is that I have a very large dataset and I have complex clusters and I need to understand which clusters "descend" (small clusters) from the large clusters.
# Load data
data(USArrests)
# Compute distances and hierarchical clustering
dd <- dist(scale(USArrests), method = "euclidean")
hc <- hclust(dd, method = "ward.D2")
par(mfrow = c(2,2))
# Plot the obtained dendrogram
plot(hc, cex = 0.6, hang = -1)
rect.hclust(hc, k = 2, border = 2:5)
plot(hc, cex = 0.6, hang = -1)
rect.hclust(hc, k = 4, border = 2:5)
plot(hc, cex = 0.6, hang = -1)
rect.hclust(hc, k = 8, border = 2:5)
plot(hc, cex = 0.6, hang = -1)
rect.hclust(hc, k = 12, border = 2:5)
For instance, here I have two solutions: 2 clusters and 4 clusters. It is unclear to me how I can know which sub_grp2
clusters were divided from the 2 sub_grp1
clusters (and so on).
# Cut tree into 4 groups
sub_grp1 <- cutree(hc, k = 2)
sub_grp2 <- cutree(hc, k = 4)
sub_grp3 <- cutree(hc, k = 8)
sub_grp4 <- cutree(hc, k = 12)
USArrests$sub_grp1 = sub_grp1
USArrests$sub_grp2 = sub_grp2
USArrests$sub_grp3 = sub_grp3
USArrests$sub_grp4 = sub_grp4
What I really would like to draw, or retrieve in any way, is something like:
This would really help me know which of the smaller clusters "descend" from the larger ones.
Does that make sense?
Upvotes: 2
Views: 876
Reputation: 37641
One solution would be to convert your dendrogram to an igraph
graph and use the plotting tools available in igraph.
With all 50 states it is a little crowded, but you can see the tree structure.
## Convert to a phylo, then to igraph
library(ape)
PH = as.phylo(hc)
IG = as.igraph(PH)
## Make a nice layout
LO = layout_as_tree(IG)
LO2 = LO[,2:1]
LO2[,1] = LO2[,1]*6
## plot
plot(IG, layout=LO2, vertex.size=80, edge.arrow.size=0.5,
rescale=F, vertex.label.cex = 0.8,
xlim=range(LO2[,1]), ylim=range(LO2[,2]))
Upvotes: 2
Reputation: 46898
You can try the clustree
package. The order might not be similar to that in the dendrogram, but you can see the relationship:
library(clustree)
data(USArrests)
# Compute distances and hierarchical clustering
dd <- dist(scale(USArrests), method = "euclidean")
hc <- hclust(dd, method = "ward.D2")
Ks = c(1,2,4,6,8)
clus_results = sapply(Ks,function(i){
cutree(hc,i)
})
colnames(clus_results) = paste0("K",Ks)
clustree(clus_results, prefix = "K")
Upvotes: 2