hnguyen
hnguyen

Reputation: 814

How to make a stacked bar chart with overall error bar?

I'd like to make a bar chart similar to this one, from enter image description here https://www.nature.com/articles/s41893-019-0415-y.

I have tried

ggplot(diamonds) +  
  geom_bar(aes(clarity, fill=color,
             ymin=count-sd(count), 
             ymax=count+sd(count))) +
  geom_errorbar( position = "identity", colour="black") 

but it returned an error: Warning: Ignoring unknown aesthetics: ymin, ymax Error in as.double(x) : cannot coerce type 'closure' to vector of type 'double'

Calculating count and its sd separately doesn't help.

head(diamonds %>%
  group_by(color, clarity) %>%
  summarize(count = n(), 
            sd_count = sd(count)), 10)

# A tibble: 10 × 4
# Groups:   color [2]
   color clarity count sd_count
   <ord> <ord>   <int>    <dbl>
 1 D     I1         42       NA
 2 D     SI2      1370       NA
 3 D     SI1      2083       NA
 4 D     VS2      1697       NA
 5 D     VS1       705       NA
 6 D     VVS2      553       NA
 7 D     VVS1      252       NA
 8 D     IF         73       NA
 9 E     I1        102       NA
10 E     SI2      1713       NA

This code gave a perfect stacked bar chart, without the error bars.

ggplot(diamonds) +  
  geom_bar(aes(clarity, fill=color)) 

enter image description here

Many thanks for any leads to help fixing my code.

Upvotes: 0

Views: 818

Answers (1)

walter
walter

Reputation: 528

The problem is that geom_bar is constructing the histogram for you, and thus doesn't have access to all of the other columns in your dataframe. Here I use geom_col to construct the histogram myself, including calculating the total count and standard deviation for each group.

Another option (df2, second plot below) is to replace the mutate with a summarize, and pass that as a separate dataframe to geom_errorbar (and keep your geom_bar). I would prefer this, as it only draws the error bars once per bar, instead of overdrawing them for each color.

library(dplyr, warn.conflicts = FALSE)
library(ggplot2)

df <- diamonds %>% group_by(clarity, color) %>% 
    summarize(count = n()) %>% 
    group_by(clarity) %>% 
    mutate(sd = sqrt(var(count)), count_total = sum(count)) %>% 
    ungroup()
#> `summarise()` has grouped output by 'clarity'. You can override using the `.groups` argument.

ggplot(df, aes(clarity, y=count, fill=color, ymin=count_total-sd, ymax = count_total+sd)) +  
    geom_col()+
    geom_errorbar()

df2 <- diamonds %>% group_by(clarity, color) %>% 
    summarize(count = n()) %>% 
    group_by(clarity) %>% 
    summarize(sd = sqrt(var(count)), count_total = sum(count)) %>% 
    ungroup()           
#> `summarise()` has grouped output by 'clarity'. You can override using the `.groups` argument.

ggplot(diamonds, aes(clarity, fill=color)) +  
    geom_bar()+
    geom_errorbar(data = df2, aes(clarity, ymin=count_total-sd, ymax = count_total+sd), inherit.aes = FALSE)

Created on 2021-10-10 by the reprex package (v2.0.1)

Upvotes: 2

Related Questions