takeITeasy
takeITeasy

Reputation: 360

How to perform hierarchical clustering with predifined clusters/classes?

I have a database I did hierarchical clustering on (with agnes()) and it worked well (I did it like here described: https://uc-r.github.io/hc_clustering. Now I want to compare manmade clusters or classes in the database with the ones that the hierarchical clustering found. I think I can do this with tanglegram(). I do not know how to generate a dendrogram/ doing hierarchical clustering when I already have groups. How can I tell R about the groups? It would be great if you could answer this question methodical. `

set.seed(73)
great <- data.frame(c0=c("r1","r2","r3","r4","r5","r6"),c1=c("0.89","46","0","0.56","12","0"),c2=c("0","0.45","45","79","0.45","4.4"))

#euclidean distance

great_dist <- dist(great)

#agglomerative with agnes()
#wards minimizes total within cluster variance
#minimum between-cluster-distance is merged

hc1_wards <- agnes(great,method ="ward")
 #agglomerative coefficient
hc1_wards$ac

hc1_wards_plot <- pltree(hc1_wards, cex = 0.6, hang = -1, main = "Dendrogram\nagglomerative clustering",labels=F) 

#cutting into a specific amount of clusters

#average silhouette method

fviz_nbclust(great, FUN = hcut, method = "silhouette")

# Cut tree into 2 groups

great_grp <-
agnes(great, method = "ward")
great_grp_cut <- cutree(as.hclust(great), k = 2)

#using the cutree output to add the cluster each observation belongs to sub

great_cluster <- mutate(great,cluster = great_grp_cut)  


#evaluating goodness of cluster with dunn()
#with count() how many obs. in one cluster

count(great_cluster,cluster)

dunn <- clValid::dunn(distance = great_dist,clusters = great_grp_cut)

`

The lines 1,2,4 und 3,5,6 are manmade clusters of great.

cl1 <- great[c(1,2,4), ]
cl2 <- great[c(3,5,6, ]

I want to compare the hierarchical clustering and manmade clustering. How can I perform a dendrogram with the manmade clustering in order to compare them with tenglegram(). Is there another way to compare them?

Upvotes: 1

Views: 296

Answers (1)

Karolis Koncevičius
Karolis Koncevičius

Reputation: 9656

To compare the clusters visually you can use plotDendroAndColors() function from WGCNA package. The function simply displays custom color information for each object under the dendrogram.

I cannot reproduce your example (the packages you used in your code are not specified), so I am demonstrating this using a simple clustering of iris dataset:

library(WGCNA)

fit     <- hclust(dist(iris[,-5]), method="ward")
groups  <- cutree(fit, 3)
manmade <- as.numeric(iris$Species)

plotDendroAndColors(fit, cbind(clusters=labels2colors(groups), manmade=labels2colors(manmade)))

clusters

Since you are using some kind of third-party packages for clustering, you might have to first convert their objects to dendrograms for this plotting function to work. Maybe via:

fit <- as.dendrogram(hc1_wards)
plotDendroAndColors(fit, cbind(clusters=labels2colors(groups), manmade=labels2colors(manmade)))

Upvotes: 2

Related Questions