ralph
ralph

Reputation: 93

R plots with equal color mapping

The following code produces the shown plot.

library(ggplot2)
library(dendextend)
library(cowplot)
set.seed(1234)

N<-10

set1 <- mvrnorm(n = N, c(0,0), matrix(c(0.5,0,0,0.5),2))
df <- data.frame(set1,label=1:N)

# ?dist
# dist method options: "euclidean", "maximum", "manhattan", "canberra", "binary" or "minkowski"
set1.dist <- dist(x=df[1:2],method = "euclidean")

fit1 <- hclust(d=set1.dist, method = "complete")
df$cluster <- cutree(fit1,k = 3)

p1 <- ggplot(df) +
   geom_text(aes(x=X1,y=X2,label=label ,color=as.factor(cluster)))+
   theme(legend.position = "none")
# p1

# ?hclust
# hclust method options "ward.D", "ward.D2", "single", "complete", "average" (= UPGMA), "mcquitty" (= WPGMA), "median" (= WPGMC) or "centroid" (= UPGMC).
p2 <- hclust(d=set1.dist, method = "complete") %>%
   as.dendrogram() %>% 
   color_labels(k=3) %>% 
   set("branches_k_color", k = 3) %>% 
   as.ggdend() %>% 
   ggplot(horiz = T, theme = NULL) + 
   theme(axis.title.y = element_blank(),
         axis.text.y = element_blank(),
         axis.ticks.y = element_blank())
#plot(fit1,hang = -1, labels = df$label, main="Test",xlab = "")

#?rect.hclust()
plot_grid(p1,p2)

Scatter plot and dendrogramm

I want to have the same color assignment for the clusters in both plots (scatter and dendrogram), but none of my attempts seem to work out. I think there is a wrong ordering of the clusters in the dendrogramm or some thing else.

Upvotes: 1

Views: 104

Answers (1)

dc37
dc37

Reputation: 16178

The issue with your color pattern is that your vales are not sorted in the same order for each plot. On the right plot, hclust will order each id in function of their distance and on the left plot they are sorted by their label id.

To get the same order, you need to attribute the order of hclust to your dataframe. You can find this order in the variable order from your hclust object:

> fit1$order
 [1]  5  6  2  3 10  4  1  7  8  9

So, you can now pass this order in your df by doing (after you defined their cluster id):

fit1 <- hclust(d=set1.dist, method = "complete")
df$cluster <- cutree(fit1,k = 3)
df <- df[order(match(df$label, fit1$order)),]

            X1         X2 label cluster
5  -0.67846476  0.3034370     5       2
6   0.07798362  0.3578356     6       2
2   0.70596583  0.1961721     2       2
3   0.54889439  0.7668157     3       2
10 -1.70825344 -0.6293518    10       3
4  -0.04557927 -1.6586588     4       1
1   0.33742619 -0.8535244     1       1
7   0.36133829 -0.4064025     7       1
8   0.64431246 -0.3865271     8       1
9   0.59196977 -0.3991278     9       1

Now for plotting the first graph, you need to set cluster as a factor and attribute levels based on this order:

p1 <- df %>% mutate(cluster = factor(cluster, unique(cluster))) %>%
  ggplot()+
  geom_text(aes(x=X1,y=X2,label=label ,color=cluster))+
  theme(legend.position = "none")

Then, the second plot does not change and you will finally get:

enter image description here

Does it answer your question ?

Upvotes: 1

Related Questions