thiagoveloso
thiagoveloso

Reputation: 2763

r - ggplot2 - Add differences to grouped bar charts

I am plotting the following data on ggplot:

library(ggplot2)

DF <- structure(list(Type = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L), .Label = c("Observed", "Simulated"), class = "factor"), 
    variable = structure(c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L), .Label = c("EM to V6", 
    "V6 to R0", "R0 to R4", "R4 to R9"), class = "factor"), value = c(28, 
    30, 29, 35, 32, 34, 26, 29)), row.names = c(NA, -8L), .Names = c("Type", 
"variable", "value"), class = "data.frame")

ggplot(DF, aes(variable, value)) +
      geom_bar(aes(fill = Type), position = "dodge", stat="identity", width=.5) +
      geom_text(aes(label=value, group=Type), position=position_dodge(width=0.5), vjust=-0.5) +
      theme_bw(base_size = 18) +
      ylab('Duration (days)') + xlab('Growth stages')

enter image description here

I was wondering if there is any graphical way to add the differences between each group of bars to the chart?

This is the data frame with the differences to be added:

DF2 <- data.frame(variable=c("EM to V6", "V6 to R0", "R0 to R4", "R4 to R9"), value=c(2,6,2,3)

The final chart would look somewhat like this (notice the coloured bars):

enter image description here

source: https://www.excelcampus.com/charts/variance-clustered-column-bar-chart/

Is that possible to do using ggplot?

Upvotes: 3

Views: 6678

Answers (1)

Marius
Marius

Reputation: 60060

As rawr suggested, you can add a layer of bars behind the current ones with a slightly smaller width:

library(tidyverse)
diff_df = DF %>%
    group_by(variable) %>%
    spread(Type, value) %>%
    mutate(diff = Simulated - Observed)

ggplot(DF, aes(variable, value)) +
    geom_bar(aes(y = Simulated), data = diff_df, stat = "identity", fill = "grey80", width = 0.4) +
    geom_bar(aes(fill = Type), position = "dodge", stat="identity", width=.5) +
    geom_text(aes(label=value, group=Type), position=position_dodge(width=0.5), vjust=-0.5) +
    geom_text(aes(label = diff, y = Simulated), vjust=-0.5, data = diff_df, hjust = 2, colour = scales::muted("red")) +
    theme_bw(base_size = 18) +
    ylab('Duration (days)') + xlab('Growth stages')

Updated code to deal with Observed sometimes being higher than Simulated:

library(tidyverse)
diff_df = DF %>%
    group_by(variable) %>%
    spread(Type, value) %>%
    mutate(diff = Simulated - Observed,
           max_y = max(Simulated, Observed),
           sim_higher = Simulated > Observed)

ggplot(DF, aes(variable, value)) +
    geom_bar(aes(y = max_y), data = diff_df, stat = "identity", fill = "grey80", width = 0.4) +
    geom_bar(aes(fill = Type), position = "dodge", stat="identity", width=.5) +
    geom_text(aes(label=value, group=Type), position=position_dodge(width=0.5), vjust=-0.5) +
    geom_text(aes(label = diff, y = max_y), vjust=-0.5, data = diff_df %>% filter(sim_higher), 
              hjust = 2, colour = scales::muted("red")) +
    geom_text(aes(label = diff, y = max_y), vjust=-0.5, data = diff_df %>% filter(!sim_higher), 
              hjust = -1, colour = scales::muted("red")) +
    theme_bw(base_size = 18) +
    ylab('Duration (days)') + xlab('Growth stages')

Upvotes: 6

Related Questions