Jot eN
Jot eN

Reputation: 6396

How to link a dendrogram branch to a class?

I've created a classification and divided the iris dataset into three classes. Afterwards, I would like to link classes (colors) to observations from dataset. I tried to use a cutree function. As a result, I've got classes from 1 to 3 and branches from 1 to 3, but they are not the same - the first class is the third branch, the second class is the first branch, and the third class is second the branch. How can I correctly link output classes (based on cutree) and branches in a plot?

> library('dendextend')
> library('tidyverse')
> iris <- datasets::iris
> iris2 <- iris[,-5]
> d_iris <- dist(iris2)
> hc_iris <- hclust(d_iris, method = "complete")
> dend <- as.dendrogram(hc_iris)
> dend <- color_branches(dend, h = 3.5)
> dend <- color_labels(dend, h = 3.5)
> plot(dend)

enter image description here

> cuts <- cutree(dend, h=3.5)
> data_frame(class=cuts, obj=as.numeric(names(cuts))) %>% 
+         group_by(class) %>%
+         summarise(n())
# A tibble: 3 × 2
  class `n()`
  <int> <int>
1     1    50
2     2    72
3     3    28
> plot(cut(dend, h=3.5)$upper)

enter image description here

Upvotes: 1

Views: 347

Answers (1)

mshea
mshea

Reputation: 41

The cutree function in the dendextend package has an argument called order_clusters_as_data which is a logical argument that allows you to order the clusters by the order of the original data (TRUE) or by the order of the labels on the dendrogram (FALSE). The default is TRUE, but since the cut function numbers the branches based on the order on the dendrogram, you want order_clusters_as_data = FALSE:

cuts <- cutree(dend, h=3.5, order_clusters_as_data=FALSE)
data_frame(class=cuts, obj=as.numeric(names(cuts))) %>% 
     group_by(class) %>%
     summarise(n())
# A tibble: 3 × 2
  class `n()`
  <int> <int>
1     1    72
2     2    28
3     3    50
plot(cut(dend, h=3.5)$upper)

Upvotes: 1

Related Questions