Reputation: 475
I am trying to plot a correlation matrix that includes thousands of pairwise comparisons. I am thinking to use ggplot2 in R to plot it out. There are 4 main issues would like to address (some of them have been addressed, but I can amend them if the proposed method requires specific pre-requisites. I am listing them here, so to ensure the final solution is compatible with them).
below are the code for the toy dataset and my current approaches
data input:
set.seed(1234)
M1<-matrix(rnorm(36)*3,nrow=6)
rownames(M1) <- c("s1", "s2", "s3", "s4", "s5", "s6")
colnames(M1) <- c("g1", "g2", "g3", "g4", "g5", "g6")
set.seed(2345)
M2<-matrix(rnorm(36),nrow=6)
rownames(M2) <- c("s1", "s2", "s3", "s4", "s5", "s6")
colnames(M2) <- c("g1", "g2", "g3", "g4", "g5", "g6")
M3 <- M1
diag(M3) <- NA
M3[upper.tri(M3)] <- M2[upper.tri(M2)]
cluster_annotation <- data.frame(cluster = c("c3", "c2", "c1"),
cluster_anno = c("This is 3", "This is 2", "This is 111111111111111111111111This text has been cut"))
forgeneorder <- c("g1", "g4", "g5", "g3", "g6", "g2")
forsampleorder <- c("s1", "s4", "s5", "s3", "s6", "s2")
annotation_dataset <- data.frame(gene = c("g1", "g2", "g3", "g4", "g5", "g6"),
cluster = c("c2", "c3", "c1", "c2", "c2", "c3"))
My current trials:
annotation_data <- annotation_dataset %>%
as_tibble() %>%
mutate(gene = factor(gene, levels = !!rev(forgeneorder))) %>%
arrange(desc(gene)) %>%
mutate(geneorder = rev(row_number())) %>%
group_by(cluster) %>%
mutate(cluster_order = rev(row_number()),
cluster_min = min(cluster_order),
cluster_max = max(cluster_order),
cluster_middle = mean(geneorder)) %>%
filter(cluster_order == cluster_min | cluster_order == cluster_max) %>%
ungroup() %>%
mutate(vertexes = ifelse(cluster_order == cluster_min, geneorder - 0.5, geneorder + 0.5),
positions = ifelse(cluster_order == cluster_min, "bottumleft", "topright"),
maxgene = max(geneorder)) %>%
dplyr::select(-cluster_order, -cluster_min, -cluster_max, -geneorder, -gene) %>%
spread(positions, vertexes) %>%
left_join(cluster_annotation, by = "cluster") %>%
mutate(bottumright = maxgene - bottumleft + 1,
topleft = maxgene - topright + 1)
as_tibble(M3, rownames = "sample") %>%
gather(gene, correlation, -sample) %>%
mutate(gene = factor(gene, levels = !!rev(forgeneorder)),
sample = factor(sample, levels = !!forsampleorder)) %>%
ggplot() +
geom_tile(aes(x = sample, y = gene, fill = correlation)) +
with(annotation_data, annotate(geom = "rect", fill = "transparent", color = "black", size = 1.5,
xmin = topleft, ymin = bottumleft, xmax = bottumright, ymax = topright))+
# with(annotation_data, annotate(geom = "text", color = "black", size = maxgene*1.2,
# x= maxgene + 0.75, y = cluster_middle, label = cluster_anno, hjust = 0))+
geom_text_repel(data = annotation_data,
aes(x= maxgene + 0.75, y = cluster_middle, label = cluster_anno),
direction = "y",
hjust = 0,
segment.size = 0.2,
na.rm = TRUE,
xlim = c(NA, Inf)
) +
scale_fill_gradient(low = "red", high = "green") +
coord_equal(clip = "off") +
theme_classic() +
theme(axis.text = element_blank(),
axis.line = element_blank(),
axis.ticks = element_blank(),
axis.title = element_blank(),
legend.position = "top")
the current output:
Upvotes: 0
Views: 223
Reputation: 124473
One approach would be to use the "secondary axis trick" instead of adding the labels via geom_text_repel
. As a discrete scale does not allow for a secondary axis this requires to convert your gene
variable to a numeric so that one can make use of a continuous scale. And as you removed the axes completely we have to add the axis text for the secondary scale using theme(..., axis.text.y.right = element_text())
:
library(ggplot2)
m3 <- as_tibble(M3, rownames = "sample") %>%
gather(gene, correlation, -sample) %>%
mutate(gene = factor(gene, levels = !!rev(forgeneorder)),
sample = factor(sample, levels = !!forsampleorder))
ggplot(m3) +
geom_tile(aes(x = sample, y = as.numeric(gene), fill = correlation)) +
with(annotation_data, annotate(geom = "rect", fill = "transparent", color = "black", size = 1.5,
xmin = topleft, ymin = bottumleft, xmax = bottumright, ymax = topright)) +
scale_y_continuous(sec.axis = dup_axis(breaks = annotation_data$cluster_middle,
labels = annotation_data$cluster_anno)) +
scale_fill_gradient(low = "red", high = "green") +
coord_equal(clip = "off") +
theme_classic() +
theme(axis.text = element_blank(),
axis.line = element_blank(),
axis.ticks = element_blank(),
axis.title = element_blank(),
axis.text.y.right = element_text(),
legend.position = "top")
This approach also works if you want to have the labels on the left as you mentioned in your comment. In that case we simply have to position the y axis on the right using scale_y_continuous(..., position="right")
and add the axis text for the secondary scale via axis.text.y.left = element_text()
:
Upvotes: 1