Reputation: 147
With ggplot2 I want to plot two vectors (vec1_num, vec2_num) in two dimensions and colour the points by a group variable (vec3_char). Some data points are overlapping.
library(ggplot2)
vec1_num = c(1,2,3,4,1,3,4,5,5,5)
vec2_num = c(1,2,3,4,1,3,4,5,5,5)
vec3_char = c("A", "B", "C", "A", "B", "C", "C", "A", "B", "C")
# plot 1
ggplot(data = NULL) +
geom_point(aes(x=vec1_num, y=vec2_num, colour=vec3_char), alpha=0.4, size=4) +
scale_colour_manual(values=c("A"="darkblue", "B"="darkred", "C"="orange")) +
theme(panel.grid = element_blank())
I know I can attenuate the overlap by reducing alpha or working with geom_jitter adding a bit of noise. Like this:
# plot 2
ggplot(data = NULL) +
geom_jitter(aes(x=vec1_num, y=vec2_num, colour=vec3_char), alpha=0.4, size=4, width = 0.1) +
scale_colour_manual(values=c("A"="darkblue", "B"="darkred", "C"="orange")) +
theme(panel.grid = element_blank())
However, is it possible to make use of plot 1 but colour the overlapping points differently? So that, for example, "A" = "darkblue, "AB" = "black", "ABC" = "grey", "B" = "darkred", "BC" = "pink", "C"="orange"? And can I additionally add a small Venn Diagram (legend) that visualises the color choice for the point overlap?
Thanks!
Upvotes: 0
Views: 391
Reputation:
I would firstly create a dataframe. Then I would extract for every x y combination (list(df$vec1_num, df$vec2_num)
) what characters are present (...unique(xy_i$vec3_char)...
). Like this:
df <- data.frame(vec1_num, vec2_num, vec3_char)
df_new <- do.call("rbind.data.frame", by(df, list(df$vec1_num, df$vec2_num), function(xy_i){
chars_i <- paste0(sort(unique(xy_i$vec3_char)),collapse= "")
xy_i$chars_comb <- factor(chars_i, levels= c("A", "AB", "AC", "ABC", "B", "BC", "C"))
xy_i
}))
If you now make the plot it shows you what characters overlap at which point.
ggplot(data = df_new) +
geom_point(aes(x=vec1_num, y=vec2_num, colour=chars_comb), alpha=0.4, size=4) +
scale_colour_manual(values=c("AB" = "black", "ABC" = "grey", "B" = "darkred", "C"="orange", "AC"= "red")) +
theme(panel.grid = element_blank())
Upvotes: 1
Reputation: 1959
My way of doing this would be to convert the letters into numbers, sum them and covert back into letters.
NB The one complication is that the letters need to be A, B, D, H, ... so there is only one way of making each number combination. Though there is probably a way to start with A, B, C, ... and encode for unique values
library(tidyverse)
vec1_num = c(1,2,3,4,1,3,4,5,5,5)
vec2_num = c(1,2,3,4,1,3,4,5,5,5)
vec3_char = c("A", "B", "D", "A", "B", "D", "D", "A", "B", "D")
removeDup <- function(str) paste(rle(strsplit(str, "")[[1]])$values, collapse="") # Function to remove duplicated values in a string
data <- data.frame(x = vec1_num, y = vec2_num, col = match(vec3_char, LETTERS))
data <- data %>%
group_by(x) %>%
mutate(colour = glue::glue_collapse(col, sep = "")) %>%
select(-col) %>%
distinct(x, y, .keep_all = TRUE) %>%
mutate(colour = removeDup(colour)) %>%
mutate(colour = sapply(str_extract_all(colour, '\\d'), function(x) sum(as.integer(x)))) %>%
mutate(colour = case_when(
colour == 1 ~ "A",
colour == 2 ~ "B",
colour == 3 ~ "AB",
colour == 4 ~ "D",
colour == 5 ~ "AD",
colour == 6 ~ "BD",
colour == 7 ~ "ABD"
))
# plot 1
ggplot(data) +
geom_point(aes(x=x, y=y, colour = as_factor(colour)), alpha=0.4, size=4) +
geom_text(aes(x = x, y = y, label = colour), vjust = 2) +
scale_colour_manual(values=c("A"="darkblue", "B"="darkred", "AB"="orange", "D" = "green", "AD" = "black", "BD" = "orange", "ABD" = "purple"), name = "Colour") +
theme(panel.grid = element_blank())
Upvotes: 1