Lamma
Lamma

Reputation: 1547

Adding mean lines to catagorical jitter plot in ggplot2

I have a jitter plot with a y axis of frequencies and an x axis of categories. Within each category there is two groups:

ggplot(plot_core_FGT_free, aes(x = variable, y = value, colour = origin))+
  geom_jitter()+
  labs(y = "Frequency", x = "Metadata factors")+
  scale_x_discrete(labels = c("Gene duplication", "BGC proximity", "Horizontal gene transfer", "Known target"))+
  theme_bw()+
  theme(axis.line = element_line(colour = "black"), 
        panel.background = element_blank(), 
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank() ,
        text = element_text(size = 15),
        axis.text.x = element_text(angle = 20, hjust = 1))+
  scale_color_grey(start = 0.3, end = 0.7)

enter image description here

I want to add these means to each respective category and group:

     mean                 origin marker         
[1,] "0.3715"             "free" "Gene duplication"  
[2,] "0.15175"            "free" "BGC proximity"
[3,] "0.54125"            "free" "Horizontal gene transfer"    
[4,] "0.10525"            "free" "Known target" 
[5,] "0.344423076923077"  "FGT"  "Gene duplication"  
[6,] "0.146153846153846"  "FGT"  "BGC proximity"
[7,] "0.425576923076923"  "FGT"  "Horizontal gene transfer"    
[8,] "0.0790384615384615" "FGT"  "Known target"

I have tried using geom_segment and various stat_summarymethods to no avail, can anyone help me?

EDIT:

ggplot(plot_core_FGT_free, aes(x = variable, y = value, colour = origin))+
  geom_boxplot(size = 1)+
  geom_point(position = position_jitterdodge(), size = 2, alpha = 0.4) +
  labs(y = "Frequency", x = "Metadata factors")+
  scale_x_discrete(labels = function(x) stringr::str_replace(x, " ", "\n")) +
  scale_x_discrete(labels = c("Gene duplication", "BGC proximity", "Horizontal gene transfer", "Known target"))+
  theme(text = element_text(size = 15),
        axis.text.x = element_text(angle = 20, hjust = 1))+
  theme_minimal()

enter image description here

Upvotes: 0

Views: 1177

Answers (2)

Edo
Edo

Reputation: 7858

I think it looks pretty good with a boxplot behind.

But personally I found really difficult to understand the differences between the colours you chose...

library(dplyr)
library(ggplot2)

# recreate a mock of your data
df1 <- tibble(value = rnorm(100, 0.3715 ), origin = "free", variable = "Gene duplication")
df2 <- tibble(value = rnorm(100, 0.15175), origin = "free", variable = "BGC proximity")
df3 <- tibble(value = rnorm(100, 0.54125), origin = "free", variable = "Horizontal gene transfer")
df4 <- tibble(value = rnorm(100, 0.10525), origin = "free", variable = "Known target")
df5 <- tibble(value = rnorm(100, 0.344423076923077 ), origin = "FGT" , variable = "Gene duplication")
df6 <- tibble(value = rnorm(100, 0.146153846153846 ), origin = "FGT" , variable = "BGC proximity")
df7 <- tibble(value = rnorm(100, 0.425576923076923 ), origin = "FGT" , variable = "Horizontal gene transfer")
df8 <- tibble(value = rnorm(100, 0.0790384615384615), origin = "FGT" , variable = "Known target")
df <- bind_rows(df1,df2,df3,df4,df5,df6,df7,df8)

Your Chart + boxplot:

ggplot(df, aes(x = variable, y = value, colour = origin))+
  geom_boxplot(size = 1)+
  geom_jitter()+
  labs(y = "Frequency", x = "Metadata factors") +
  theme_minimal() +
  theme(axis.line = element_line(colour = "black"),
        panel.background = element_blank(),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank() ,
        text = element_text(size = 15),
        axis.text.x = element_text(angle = 20, hjust = 1)) +
  scale_color_grey(start = 0.3, end = 0.7)

enter image description here

If I may, I would suggest you this option instead:

ggplot(df, aes(x = variable, y = value, colour = origin))+
  geom_boxplot(size = 1)+
  geom_point(position = position_jitterdodge(), size = 2, alpha = 0.4) +
  labs(y = "Frequency", x = "Metadata factors")+
  scale_x_discrete(labels = function(x) stringr::str_replace(x, " ", "\n")) +
  theme_minimal()

enter image description here

Upvotes: 2

Rui Barradas
Rui Barradas

Reputation: 76575

The following will plot the mean lines. The trick is to pass a new data argument to geom_segment. Variable segm_len is the segments' length. The code was simplified to focus on the question problem.

library(ggplot2)
library(dplyr)

segm_len <- 0.8

ggplot(df, aes(variable, value, color = origin)) +
  geom_jitter() +
  geom_segment(data = dfmean %>% mutate(marker = as.integer(factor(marker))),
               aes(x = marker - segm_len/2, xend = marker + segm_len/2, 
                   y = mean, yend = mean, 
                   color = origin)) +
  scale_color_grey(start = 0.3, end = 0.7) +
  theme(axis.text.x = element_text(angle = 20, hjust = 1))

Data

dfmean <- read.table(text = '
     mean                 origin marker         
"0.3715"             "free" "Gene duplication"  
"0.15175"            "free" "BGC proximity"
"0.54125"            "free" "Horizontal gene transfer"    
"0.10525"            "free" "Known target" 
"0.344423076923077"  "FGT"  "Gene duplication"  
"0.146153846153846"  "FGT"  "BGC proximity"
"0.425576923076923"  "FGT"  "Horizontal gene transfer"    
"0.0790384615384615" "FGT"  "Known target"
', header = TRUE)

dfmean[[1]] <- as.numeric(as.character(dfmean[[1]]))

Upvotes: 1

Related Questions