Reputation: 594
When performing the hierarchical clustering in R with the hclust function. How do you know the height of the final merge?
So to clarify with some R default data:
hc <- hclust(dist(USArrests))
dendrogram1 = as.dendrogram(hc)
plot(hc)
Will result in a variable hc with all clustering info.
And the dendrogram:
As you can see on the dendrogram, the final merge happens at a height > 200 (about 300). But how does the dendrogram know? This info is not in the hc.height variable nor in the dendrogram1 variable. The highest mentioned merge is at 169.
If the dendrogram1 variable does not contain this information, how does the plot function know the merge must occur at a height of 300?
I am asking this because I require this number (+- 300) for other applications and reading it from the plot is downright impractical.
thanks in advance for anyone willing to help!
Upvotes: 2
Views: 6677
Reputation: 25376
@rcs answer is correct.
I will give another way to solve it, by using the get_nodes_attr
function from the dendextend package:
# install.packages("dendextend")
library(dendextend)
dend <- as.dendrogram(hclust(dist(USArrests[1:5,])))
# Like:
# dend <- USArrests[1:5,] %>% dist %>% hclust %>% as.dendrogram
# The height for all nodes:
get_nodes_attr(dend, "height")
And we can easily see the height for each node:
> get_nodes_attr(dend, "height")
[1] 108.85192 0.00000 63.00833 23.19418 0.00000 0.00000 37.17701 0.00000 0.00000
For more details on the package, you can have a look at its vignette.
Upvotes: 3
Reputation: 68849
These values can be calculated with stats::cophenetic()
:
The cophenetic distance between two observations that have been clustered is defined to be the intergroup dissimilarity at which the two observations are first combined into a single cluster.
This yields the following for your example:
sort(unique(cophenetic(hc)))
# [1] 2.291 3.834 3.929 6.237 6.638 7.355 8.027 8.538 10.860
# [10] 11.456 12.425 12.614 12.775 13.045 13.297 13.349 13.896 14.501
# [19] 15.408 15.454 15.630 15.890 16.977 18.265 19.438 19.904 21.167
# [28] 22.366 22.767 24.894 25.093 28.635 29.251 31.477 31.620 32.719
# [37] 36.735 36.848 38.528 41.488 48.725 53.593 57.271 64.994 68.762
# [46] 87.326 102.862 168.611 293.623
Upvotes: 7