Reputation: 1547
I have a jitter plot with a y axis of frequencies and an x axis of categories. Within each category there is two groups:
ggplot(plot_core_FGT_free, aes(x = variable, y = value, colour = origin))+
geom_jitter()+
labs(y = "Frequency", x = "Metadata factors")+
scale_x_discrete(labels = c("Gene duplication", "BGC proximity", "Horizontal gene transfer", "Known target"))+
theme_bw()+
theme(axis.line = element_line(colour = "black"),
panel.background = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank() ,
text = element_text(size = 15),
axis.text.x = element_text(angle = 20, hjust = 1))+
scale_color_grey(start = 0.3, end = 0.7)
I want to add these means to each respective category and group:
mean origin marker
[1,] "0.3715" "free" "Gene duplication"
[2,] "0.15175" "free" "BGC proximity"
[3,] "0.54125" "free" "Horizontal gene transfer"
[4,] "0.10525" "free" "Known target"
[5,] "0.344423076923077" "FGT" "Gene duplication"
[6,] "0.146153846153846" "FGT" "BGC proximity"
[7,] "0.425576923076923" "FGT" "Horizontal gene transfer"
[8,] "0.0790384615384615" "FGT" "Known target"
I have tried using geom_segment
and various stat_summary
methods to no avail, can anyone help me?
EDIT:
ggplot(plot_core_FGT_free, aes(x = variable, y = value, colour = origin))+
geom_boxplot(size = 1)+
geom_point(position = position_jitterdodge(), size = 2, alpha = 0.4) +
labs(y = "Frequency", x = "Metadata factors")+
scale_x_discrete(labels = function(x) stringr::str_replace(x, " ", "\n")) +
scale_x_discrete(labels = c("Gene duplication", "BGC proximity", "Horizontal gene transfer", "Known target"))+
theme(text = element_text(size = 15),
axis.text.x = element_text(angle = 20, hjust = 1))+
theme_minimal()
Upvotes: 0
Views: 1177
Reputation: 7858
I think it looks pretty good with a boxplot behind.
But personally I found really difficult to understand the differences between the colours you chose...
library(dplyr)
library(ggplot2)
# recreate a mock of your data
df1 <- tibble(value = rnorm(100, 0.3715 ), origin = "free", variable = "Gene duplication")
df2 <- tibble(value = rnorm(100, 0.15175), origin = "free", variable = "BGC proximity")
df3 <- tibble(value = rnorm(100, 0.54125), origin = "free", variable = "Horizontal gene transfer")
df4 <- tibble(value = rnorm(100, 0.10525), origin = "free", variable = "Known target")
df5 <- tibble(value = rnorm(100, 0.344423076923077 ), origin = "FGT" , variable = "Gene duplication")
df6 <- tibble(value = rnorm(100, 0.146153846153846 ), origin = "FGT" , variable = "BGC proximity")
df7 <- tibble(value = rnorm(100, 0.425576923076923 ), origin = "FGT" , variable = "Horizontal gene transfer")
df8 <- tibble(value = rnorm(100, 0.0790384615384615), origin = "FGT" , variable = "Known target")
df <- bind_rows(df1,df2,df3,df4,df5,df6,df7,df8)
Your Chart + boxplot:
ggplot(df, aes(x = variable, y = value, colour = origin))+
geom_boxplot(size = 1)+
geom_jitter()+
labs(y = "Frequency", x = "Metadata factors") +
theme_minimal() +
theme(axis.line = element_line(colour = "black"),
panel.background = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank() ,
text = element_text(size = 15),
axis.text.x = element_text(angle = 20, hjust = 1)) +
scale_color_grey(start = 0.3, end = 0.7)
If I may, I would suggest you this option instead:
ggplot(df, aes(x = variable, y = value, colour = origin))+
geom_boxplot(size = 1)+
geom_point(position = position_jitterdodge(), size = 2, alpha = 0.4) +
labs(y = "Frequency", x = "Metadata factors")+
scale_x_discrete(labels = function(x) stringr::str_replace(x, " ", "\n")) +
theme_minimal()
Upvotes: 2
Reputation: 76575
The following will plot the mean lines. The trick is to pass a new data
argument to geom_segment
. Variable segm_len
is the segments' length. The code was simplified to focus on the question problem.
library(ggplot2)
library(dplyr)
segm_len <- 0.8
ggplot(df, aes(variable, value, color = origin)) +
geom_jitter() +
geom_segment(data = dfmean %>% mutate(marker = as.integer(factor(marker))),
aes(x = marker - segm_len/2, xend = marker + segm_len/2,
y = mean, yend = mean,
color = origin)) +
scale_color_grey(start = 0.3, end = 0.7) +
theme(axis.text.x = element_text(angle = 20, hjust = 1))
Data
dfmean <- read.table(text = '
mean origin marker
"0.3715" "free" "Gene duplication"
"0.15175" "free" "BGC proximity"
"0.54125" "free" "Horizontal gene transfer"
"0.10525" "free" "Known target"
"0.344423076923077" "FGT" "Gene duplication"
"0.146153846153846" "FGT" "BGC proximity"
"0.425576923076923" "FGT" "Horizontal gene transfer"
"0.0790384615384615" "FGT" "Known target"
', header = TRUE)
dfmean[[1]] <- as.numeric(as.character(dfmean[[1]]))
Upvotes: 1