How to create a dendrogram colored by clusters with hclust and cutreeDynamic

I'm working on a clustering problem and I would like to use the hclust functions to create the dendrogram and cutreeDynamic to create clusters from the mentioned dendrogram. In fact, I have already achieved that.

# Preprocessing data for only numeric features
  omicData_clustering <- omicData

  omicData_clustering[[classVariable]] <- clinicDataSVM[[classVariable]]

  omicData_clustering <- omicData_clustering[omicData_clustering[[classVariable]] %in% c(changedClass, class), ]

  omicData_clustering <- omicData_clustering[,
                                                                       -which(names(omicData_clustering) %in% c(idColumn))]
  omicData_num <- omicData_clustering[,
                                                                -which(names(omicData_clustering) %in% c(classVariable))]
  # scale the data
  omicData_clustering_scaled <- scale(omicData_num)
  
  # getting dist
  dist <- dist(omicData_clustering_scaled)
  
  # doing hclust
  hc <- hclust(dist, method = "complete")
  
  # number of changed class for the minimum cluster size
  num <- sum(clinicDataSVM[[classVariable]] == changedClass)
  
  # getting dynamic clusters
  dynamic_clusters <- cutreeDynamic(hc, distM = as.matrix(dist), minClusterSize = num)
  
  # getting only changed class labels position
  labels <- omicData_clustering[[classVariable]]
  labels[labels != changedClass] <- ""

Where 'dynamic_clusters' has the following value, for example:

> dynamic_clusters
7 1 4 1 1 3 7 2 4 6 1 1 2 3 3 2 1 2 6 1 1 3 1 2 2 7 1 6 7 1 1 2 1 6 3 7 1 2 7 1 5 2 6 6 7 2 6 6 5 7 3 1 6 5 1 2 2 6 2 1 6 7 4 6 2 1 4 1 6 5 4 4 7 1 
4 1 5 1 1 6 4 2 5 3 1 1 2 6 6 2 1 2 3 1 1 6 1 2 2 4 1 3 4 1 1 2 1 3 6 4 1 2 4 1 7 2 3 3 4 2 3 3 7 4 6 1 3 7 1 2 2 3 2 1 3 4 5 3 2 1 5 1 3 7 5 5 4 1 
4 7 2 2 1 1 5 1 6 3 4 6 7 5 2 7 5 6 5 1 4 4 7 3 5 2 4 2 6 2 7 1 1 1 2 7 2 2 6 7 6 3 6 7 1 5 2 7 4 2 1 3 7 6 1 4 6 2 2 5 7 3 7 2 7 2 6 1 6 6 1 6 1 1 
5 4 2 2 1 1 7 1 3 6 5 3 4 7 2 4 7 3 7 1 5 5 4 6 7 2 5 2 3 2 4 1 1 1 2 4 2 2 3 4 3 6 3 4 1 7 2 4 5 2 1 6 4 3 1 5 3 2 2 7 4 6 4 2 4 2 3 1 3 3 1 3 1 1 
3 7 6 4 7 4 2 2 7 7 7 4 4 5 2 3 4 1 2 4 1 1 3 6 2 6 2 
6 4 3 5 4 5 2 2 4 4 4 5 5 7 2 6 5 1 2 5 1 1 6 3 2 3 2 

And in labels, I have the following:

> labels
  [1] ""             ""             ""             ""             ""             ""             "Control2Case" ""             ""            
 [10] ""             ""             ""             "Control2Case" ""             ""             ""             ""             ""            
 [19] ""             ""             ""             ""             ""             ""             ""             ""             ""            
 [28] ""             ""             ""             "Control2Case" ""             ""             ""             ""             ""            
 [37] ""             ""             ""             ""             "Control2Case" ""             ""             "Control2Case" ""            
 [46] ""             ""             ""             "Control2Case" ""             ""             ""             ""             ""            
 [55] ""             ""             ""             ""             ""             ""             ""             "Control2Case" ""            
 [64] ""             ""             ""             ""             ""             ""             ""             ""             ""            
 [73] ""             ""             ""             ""             ""             ""             ""             ""             ""            
 [82] ""             ""             ""             ""             ""             ""             ""             ""             ""            
 [91] ""             ""             ""             ""             ""             ""             ""             ""             ""            
[100] ""             ""             ""             ""             ""             ""             ""             ""             ""            
[109] ""             ""             ""             ""             ""             ""             "Control2Case" ""             ""            
[118] ""             ""             ""             ""             ""             ""             ""             ""             ""            
[127] ""             ""             ""             ""             ""             ""             ""             ""             "Control2Case"
[136] ""             ""             ""             ""             ""             ""             ""             ""             ""            
[145] ""             ""             ""             ""             ""             ""             ""             ""             ""            
[154] ""             ""             ""             ""             ""             ""             ""             ""             ""            
[163] ""             ""             ""             ""             ""             ""             ""             ""             ""            
[172] ""             ""             ""             ""

The issue is that I want to draw the dendrogram with the clusters and identify which clusters the 'Control2Case' fall into. Is that possible?

I have entered the following code (from https://cran.r-project.org/web/packages/dendextend/vignettes/dendextend.html):

library(dynamicTreeCut)
data(iris)
x  <- iris[,-5] %>% as.matrix
hc <- x %>% dist %>% hclust
dend <- hc %>% as.dendrogram 

# Find special clusters:
clusters <- cutreeDynamic(hc, distM = as.matrix(dist(x)), method = "tree")
# we need to sort them to the order of the dendrogram:
clusters <- clusters[order.dendrogram(dend)]
clusters_numbers <- unique(clusters) - (0 %in% clusters)
n_clusters <- length(clusters_numbers)

library(colorspace)
cols <- rainbow_hcl(n_clusters)
true_species_cols <- rainbow_hcl(3)[as.numeric(iris[,][order.dendrogram(dend),5])]
dend2 <- dend %>% 
         branches_attr_by_clusters(clusters, values = cols) %>% 
         color_labels(col =   true_species_cols)
plot(dend2)
clusters <- factor(clusters)
levels(clusters)[-1]  <- cols[-5][c(1,4,2,3)] 
   # Get the clusters to have proper colors.
   # fix the order of the colors to match the branches.
colored_bars(clusters, dend, sort_by_labels_order = FALSE)

enter image description here

But I don't know how to adapt it to my specific problem since there are some lines in the previous code specific to the Iris problem that I don't understand why they are there.

Upvotes: 0

Views: 51

Answers (1)

Yun
Yun

Reputation: 305

Another simple way to draw a dendrogram with colors is by using ggalign-without complex code, just replace the dataset with your own. In the development version of ggalign, I've introduced a new cutree argument, allowing users to apply any custom function for tree cutting. Just replace the iris data with yours. The object is a ggplot-like object, you can color the branch easily by mapping aes. All you need to be familiar with is ggplot2.

align_dendro initializes a ggplot data and mapping.

The default ggplot data is the node coordinates, in addition, a geom_segment layer with a data of the tree segments edge coordinates will be added. node and tree segments edge coordinates contains following columns:

  • index: the original index in the tree for the current node
  • label: node label text
  • x and y: x-axis and y-axis coordinates for current node or the start node of the current edge.
  • xend and yend: the x-axis and y-axis coordinates of the terminal node for current edge.
  • branch: which branch current node or edge is. You can use this column to color different groups.
  • panel: which panel current node is, if we split the plot into panel using [facet_grid][ggplot2::facet_grid], this column will show which panel current node or edge is from. Note: some nodes may fall outside panel (between two panel), so there are possible NA values in this column.
  • .panel: Similar with panel column, but always give the correct branch for usage of the ggplot facet.
  • panel1 and panel2: The panel1 and panel2 variables have the same functionality as panel, but they are specifically for the edge data and correspond to both nodes of each edge.
  • leaf: A logical value indicates whether current node is a leaf.
library(ggalign)
#> Loading required package: ggplot2
ggstack(iris[, -5L], "v") + # initialize the layout, we want to use vertical layout (how to lay out multiple plots, here, it is used only for dendrogram orientation)
    align_dendro( # add a dendrogram
        aes(color = branch), # color the branch
        cutree = function(tree, dist, k, h) {
            dynamicTreeCut::cutreeDynamic(tree, distM = dist, method = "tree")
        }
    ) +
    scale_y_continuous(expand = expansion()) + # remove y-axis expansion
    scale_color_brewer(palette = "Dark2") + # set color
    theme(axis.text.x = element_text(angle = -90, hjust = 0))

Created on 2024-10-13 with reprex v2.1.0 ~
~

enter image description here

Upvotes: 0

Related Questions