Reputation: 25
I'm working on a clustering problem and I would like to use the hclust functions to create the dendrogram and cutreeDynamic to create clusters from the mentioned dendrogram. In fact, I have already achieved that.
# Preprocessing data for only numeric features
omicData_clustering <- omicData
omicData_clustering[[classVariable]] <- clinicDataSVM[[classVariable]]
omicData_clustering <- omicData_clustering[omicData_clustering[[classVariable]] %in% c(changedClass, class), ]
omicData_clustering <- omicData_clustering[,
-which(names(omicData_clustering) %in% c(idColumn))]
omicData_num <- omicData_clustering[,
-which(names(omicData_clustering) %in% c(classVariable))]
# scale the data
omicData_clustering_scaled <- scale(omicData_num)
# getting dist
dist <- dist(omicData_clustering_scaled)
# doing hclust
hc <- hclust(dist, method = "complete")
# number of changed class for the minimum cluster size
num <- sum(clinicDataSVM[[classVariable]] == changedClass)
# getting dynamic clusters
dynamic_clusters <- cutreeDynamic(hc, distM = as.matrix(dist), minClusterSize = num)
# getting only changed class labels position
labels <- omicData_clustering[[classVariable]]
labels[labels != changedClass] <- ""
Where 'dynamic_clusters' has the following value, for example:
> dynamic_clusters
7 1 4 1 1 3 7 2 4 6 1 1 2 3 3 2 1 2 6 1 1 3 1 2 2 7 1 6 7 1 1 2 1 6 3 7 1 2 7 1 5 2 6 6 7 2 6 6 5 7 3 1 6 5 1 2 2 6 2 1 6 7 4 6 2 1 4 1 6 5 4 4 7 1
4 1 5 1 1 6 4 2 5 3 1 1 2 6 6 2 1 2 3 1 1 6 1 2 2 4 1 3 4 1 1 2 1 3 6 4 1 2 4 1 7 2 3 3 4 2 3 3 7 4 6 1 3 7 1 2 2 3 2 1 3 4 5 3 2 1 5 1 3 7 5 5 4 1
4 7 2 2 1 1 5 1 6 3 4 6 7 5 2 7 5 6 5 1 4 4 7 3 5 2 4 2 6 2 7 1 1 1 2 7 2 2 6 7 6 3 6 7 1 5 2 7 4 2 1 3 7 6 1 4 6 2 2 5 7 3 7 2 7 2 6 1 6 6 1 6 1 1
5 4 2 2 1 1 7 1 3 6 5 3 4 7 2 4 7 3 7 1 5 5 4 6 7 2 5 2 3 2 4 1 1 1 2 4 2 2 3 4 3 6 3 4 1 7 2 4 5 2 1 6 4 3 1 5 3 2 2 7 4 6 4 2 4 2 3 1 3 3 1 3 1 1
3 7 6 4 7 4 2 2 7 7 7 4 4 5 2 3 4 1 2 4 1 1 3 6 2 6 2
6 4 3 5 4 5 2 2 4 4 4 5 5 7 2 6 5 1 2 5 1 1 6 3 2 3 2
And in labels, I have the following:
> labels
[1] "" "" "" "" "" "" "Control2Case" "" ""
[10] "" "" "" "Control2Case" "" "" "" "" ""
[19] "" "" "" "" "" "" "" "" ""
[28] "" "" "" "Control2Case" "" "" "" "" ""
[37] "" "" "" "" "Control2Case" "" "" "Control2Case" ""
[46] "" "" "" "Control2Case" "" "" "" "" ""
[55] "" "" "" "" "" "" "" "Control2Case" ""
[64] "" "" "" "" "" "" "" "" ""
[73] "" "" "" "" "" "" "" "" ""
[82] "" "" "" "" "" "" "" "" ""
[91] "" "" "" "" "" "" "" "" ""
[100] "" "" "" "" "" "" "" "" ""
[109] "" "" "" "" "" "" "Control2Case" "" ""
[118] "" "" "" "" "" "" "" "" ""
[127] "" "" "" "" "" "" "" "" "Control2Case"
[136] "" "" "" "" "" "" "" "" ""
[145] "" "" "" "" "" "" "" "" ""
[154] "" "" "" "" "" "" "" "" ""
[163] "" "" "" "" "" "" "" "" ""
[172] "" "" "" ""
The issue is that I want to draw the dendrogram with the clusters and identify which clusters the 'Control2Case' fall into. Is that possible?
I have entered the following code (from https://cran.r-project.org/web/packages/dendextend/vignettes/dendextend.html):
library(dynamicTreeCut)
data(iris)
x <- iris[,-5] %>% as.matrix
hc <- x %>% dist %>% hclust
dend <- hc %>% as.dendrogram
# Find special clusters:
clusters <- cutreeDynamic(hc, distM = as.matrix(dist(x)), method = "tree")
# we need to sort them to the order of the dendrogram:
clusters <- clusters[order.dendrogram(dend)]
clusters_numbers <- unique(clusters) - (0 %in% clusters)
n_clusters <- length(clusters_numbers)
library(colorspace)
cols <- rainbow_hcl(n_clusters)
true_species_cols <- rainbow_hcl(3)[as.numeric(iris[,][order.dendrogram(dend),5])]
dend2 <- dend %>%
branches_attr_by_clusters(clusters, values = cols) %>%
color_labels(col = true_species_cols)
plot(dend2)
clusters <- factor(clusters)
levels(clusters)[-1] <- cols[-5][c(1,4,2,3)]
# Get the clusters to have proper colors.
# fix the order of the colors to match the branches.
colored_bars(clusters, dend, sort_by_labels_order = FALSE)
But I don't know how to adapt it to my specific problem since there are some lines in the previous code specific to the Iris problem that I don't understand why they are there.
Upvotes: 0
Views: 51
Reputation: 305
Another simple way to draw a dendrogram with colors is by using ggalign
-without complex code, just replace the dataset with your own. In the development version of ggalign, I've introduced a new cutree
argument, allowing users to apply any custom function for tree cutting. Just replace the iris
data with yours. The object is a ggplot-like object, you can color the branch easily by mapping aes. All you need to be familiar with is ggplot2.
align_dendro
initializes a ggplot data
and mapping
.
The default ggplot data is the node
coordinates, in addition, a
geom_segment
layer with a data of the tree segments edge
coordinates will be added. node
and tree segments edge
coordinates contains following columns:
index
: the original index in the tree for the current nodelabel
: node label textx
and y
: x-axis and y-axis coordinates for current node or the start node of the current edge.xend
and yend
: the x-axis and y-axis coordinates of the terminal node for current edge.branch
: which branch current node or edge is. You can use this column to color different groups.panel
: which panel current node is, if we split the plot into panel using [facet_grid][ggplot2::facet_grid], this column will show which panel current node or edge is from. Note: some nodes may fall outside panel (between two panel), so there are possible NA
values in this column..panel
: Similar with panel
column, but always give the correct branch for usage of the ggplot facet.panel1
and panel2
: The panel1 and panel2 variables have the same functionality as panel
, but they are specifically for the edge
data and correspond to both nodes of each edge.leaf
: A logical value indicates whether current node is a leaf.library(ggalign)
#> Loading required package: ggplot2
ggstack(iris[, -5L], "v") + # initialize the layout, we want to use vertical layout (how to lay out multiple plots, here, it is used only for dendrogram orientation)
align_dendro( # add a dendrogram
aes(color = branch), # color the branch
cutree = function(tree, dist, k, h) {
dynamicTreeCut::cutreeDynamic(tree, distM = dist, method = "tree")
}
) +
scale_y_continuous(expand = expansion()) + # remove y-axis expansion
scale_color_brewer(palette = "Dark2") + # set color
theme(axis.text.x = element_text(angle = -90, hjust = 0))
Created on 2024-10-13 with reprex v2.1.0
~
~
Upvotes: 0