Reputation: 6396
I've created a classification and divided the iris dataset into three classes. Afterwards, I would like to link classes (colors) to observations from dataset. I tried to use a cutree
function. As a result, I've got classes from 1 to 3 and branches from 1 to 3, but they are not the same - the first class is the third branch, the second class is the first branch, and the third class is second the branch. How can I correctly link output classes (based on cutree
) and branches in a plot?
> library('dendextend')
> library('tidyverse')
> iris <- datasets::iris
> iris2 <- iris[,-5]
> d_iris <- dist(iris2)
> hc_iris <- hclust(d_iris, method = "complete")
> dend <- as.dendrogram(hc_iris)
> dend <- color_branches(dend, h = 3.5)
> dend <- color_labels(dend, h = 3.5)
> plot(dend)
> cuts <- cutree(dend, h=3.5)
> data_frame(class=cuts, obj=as.numeric(names(cuts))) %>%
+ group_by(class) %>%
+ summarise(n())
# A tibble: 3 × 2
class `n()`
<int> <int>
1 1 50
2 2 72
3 3 28
> plot(cut(dend, h=3.5)$upper)
Upvotes: 1
Views: 347
Reputation: 41
The cutree
function in the dendextend package has an argument called order_clusters_as_data
which is a logical argument that allows you to order the clusters by the order of the original data (TRUE) or by the order of the labels on the dendrogram (FALSE). The default is TRUE, but since the cut function numbers the branches based on the order on the dendrogram, you want order_clusters_as_data = FALSE
:
cuts <- cutree(dend, h=3.5, order_clusters_as_data=FALSE)
data_frame(class=cuts, obj=as.numeric(names(cuts))) %>%
group_by(class) %>%
summarise(n())
# A tibble: 3 × 2
class `n()`
<int> <int>
1 1 72
2 2 28
3 3 50
plot(cut(dend, h=3.5)$upper)
Upvotes: 1