Reputation: 1044
The workflow I want to implement is:
dm <- dist(data)
dend <- hclust(dm)
k <- stats::cutree(dend, k = 10)
data$clusters <- k
plot(hclust, colorBranchees = k) #???? What I can use here.
So I searched for color dendrogram branches using cutree
output. All I found is dendextend
.
Problem is that I am failing to implement the workflow with dendextend
.
This is what I came up with, but I would now like to have clusterLabels
shown
library(dendextend)
hc <- hclust(dist(USArrests))
dend <- as.dendrogram(hc)
kcl <- dendextend::cutree(dend, k = 4)
dend1 <- color_branches(dend, clusters = kcl[order.dendrogram(dend)], groupLabels = TRUE)%>% set("labels_cex", 1)
plot(dend1, main = "Dendrogram dist JK")
Also, trying something like groupLabels = 1:4
does not help.
Specifying with the param k
(number of o clusters) the groupLable does work. But unfortunately, the labels are different than those generated by dendextend own cutree method.
Note that here cluster 4 has 2 members.
> table(kcl)
kcl
1 2 3 4
14 14 20 2
This post suggest to use dendextend::cutree(dend,k = nrCluster, order_clusters_as_data = FALSE)
r dendrogram - groupLabels not match real labels (package dendextend)
But then I can not use the output of dendextend::cutree
to group the data (since the ordering does not match.
I would be happy to use a different dendrogram plotting library in R but so far my Web searches for "coloring dendrogram branches by cutree output" point to the dendextend package.
Upvotes: 0
Views: 172
Reputation: 25306
I'm sorry but I'm not sure I fully understand your question.
It seems like you want to align between curtree's output and your original data.
If that's the case, then you need to use dendextend::cutree(dend,k = nrCluster, order_clusters_as_data = TRUE)
e.g.:
require(dendextend)
d1 <- USArrests[1:10,]
hc <- hclust(dist(d1))
dend <- as.dendrogram(hc)
k <- dendextend::cutree(dend, k = 3, order_clusters_as_data = TRUE)
d2 <- cbind(d1, k)
plot(color_branches(dend, 3))
d2
# an easier way to see the clusters is by ordering the rows of the data based on the order of the dendrogram
d2[order.dendrogram(dend),]
The plot is fine:
And the clusters are mapped correctly to the data (see outputs)
> require(dendextend)
> d1 <- USArrests[1:10,]
> hc <- hclust(dist(d1))
> dend <- as.dendrogram(hc)
> k <- dendextend::cutree(dend, k = 3, order_clusters_as_data = TRUE)
> d2 <- cbind(d1, k)
> plot(color_branches(dend, 3))
> d2
Murder Assault UrbanPop Rape k
Alabama 13.2 236 58 21.2 1
Alaska 10.0 263 48 44.5 1
Arizona 8.1 294 80 31.0 2
Arkansas 8.8 190 50 19.5 1
California 9.0 276 91 40.6 2
Colorado 7.9 204 78 38.7 1
Connecticut 3.3 110 77 11.1 3
Delaware 5.9 238 72 15.8 1
Florida 15.4 335 80 31.9 2
Georgia 17.4 211 60 25.8 1
> # an easier way to see the clusters is by ordering the rows of the data based on the order of the dendrogram
> d2[order.dendrogram(dend),]
Murder Assault UrbanPop Rape k
Connecticut 3.3 110 77 11.1 3
Florida 15.4 335 80 31.9 2
Arizona 8.1 294 80 31.0 2
California 9.0 276 91 40.6 2
Arkansas 8.8 190 50 19.5 1
Colorado 7.9 204 78 38.7 1
Georgia 17.4 211 60 25.8 1
Alaska 10.0 263 48 44.5 1
Alabama 13.2 236 58 21.2 1
Delaware 5.9 238 72 15.8 1
Please LMK if this answers your question or if you have followup questions here.
Upvotes: 0