Reputation: 93
The following code produces the shown plot.
library(ggplot2)
library(dendextend)
library(cowplot)
set.seed(1234)
N<-10
set1 <- mvrnorm(n = N, c(0,0), matrix(c(0.5,0,0,0.5),2))
df <- data.frame(set1,label=1:N)
# ?dist
# dist method options: "euclidean", "maximum", "manhattan", "canberra", "binary" or "minkowski"
set1.dist <- dist(x=df[1:2],method = "euclidean")
fit1 <- hclust(d=set1.dist, method = "complete")
df$cluster <- cutree(fit1,k = 3)
p1 <- ggplot(df) +
geom_text(aes(x=X1,y=X2,label=label ,color=as.factor(cluster)))+
theme(legend.position = "none")
# p1
# ?hclust
# hclust method options "ward.D", "ward.D2", "single", "complete", "average" (= UPGMA), "mcquitty" (= WPGMA), "median" (= WPGMC) or "centroid" (= UPGMC).
p2 <- hclust(d=set1.dist, method = "complete") %>%
as.dendrogram() %>%
color_labels(k=3) %>%
set("branches_k_color", k = 3) %>%
as.ggdend() %>%
ggplot(horiz = T, theme = NULL) +
theme(axis.title.y = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank())
#plot(fit1,hang = -1, labels = df$label, main="Test",xlab = "")
#?rect.hclust()
plot_grid(p1,p2)
I want to have the same color assignment for the clusters in both plots (scatter and dendrogram), but none of my attempts seem to work out. I think there is a wrong ordering of the clusters in the dendrogramm or some thing else.
Upvotes: 1
Views: 104
Reputation: 16178
The issue with your color pattern is that your vales are not sorted in the same order for each plot. On the right plot, hclust
will order each id in function of their distance and on the left plot they are sorted by their label id.
To get the same order, you need to attribute the order of hclust
to your dataframe. You can find this order in the variable order
from your hclust
object:
> fit1$order
[1] 5 6 2 3 10 4 1 7 8 9
So, you can now pass this order in your df by doing (after you defined their cluster id):
fit1 <- hclust(d=set1.dist, method = "complete")
df$cluster <- cutree(fit1,k = 3)
df <- df[order(match(df$label, fit1$order)),]
X1 X2 label cluster
5 -0.67846476 0.3034370 5 2
6 0.07798362 0.3578356 6 2
2 0.70596583 0.1961721 2 2
3 0.54889439 0.7668157 3 2
10 -1.70825344 -0.6293518 10 3
4 -0.04557927 -1.6586588 4 1
1 0.33742619 -0.8535244 1 1
7 0.36133829 -0.4064025 7 1
8 0.64431246 -0.3865271 8 1
9 0.59196977 -0.3991278 9 1
Now for plotting the first graph, you need to set cluster
as a factor and attribute levels based on this order:
p1 <- df %>% mutate(cluster = factor(cluster, unique(cluster))) %>%
ggplot()+
geom_text(aes(x=X1,y=X2,label=label ,color=cluster))+
theme(legend.position = "none")
Then, the second plot does not change and you will finally get:
Does it answer your question ?
Upvotes: 1