Mel
Mel

Reputation: 750

How to add labels with observation count to stat_summary ggplot?

I have a dataset e.g.

outcome <- c(rnorm(500, 45, 10), rnorm(250, 40, 12), rnorm(150, 38, 7), rnorm(1000, 35, 10), rnorm(100, 30, 7))
group <- c(rep("A", 500), rep("B", 250), rep("C", 150), rep("D", 1000), rep("E", 100))
reprex <- data.frame(outcome, group)

I can plot this as a "dynamite" plot with:

graph <- ggplot(reprex, aes(x=group, y=outcome, fill=..y..)) +
  stat_summary(geom = "bar", fun.y = mean) +
  stat_summary(geom = "errorbar", fun.data = mean_cl_normal, width = 0.1)

giving:

picture of graph

I would also like to add beneath each column a label specifying how many observations were in that group. However I can't work out how to do this. I tried:

graph + geom_label (aes(label=paste(..count.., "Obs.", sep=" ")), y=-0.75, size=3.5, color="black", fontface="bold")

which returns

Error in paste(count, "Obs.", sep = " ") : 
  cannot coerce type 'closure' to vector of type 'character'

I've also tried

  graph + stat_summary(aes(label=paste(..y.., "Obs.", sep=" ")), fun.y=count, geom="label")

but this returns:

Error: stat_summary requires the following missing aesthetics: y

I know that I can do this if I just make a dataframe of summary statistics first but that will result in me creating a new dataframe every time I need a graph and therefore I'd ideally like to be able to plot this using stat_summary() from the original dataset.

Does anyone know how to do this?

Upvotes: 5

Views: 2899

Answers (2)

dc37
dc37

Reputation: 16178

Without to create a new dataframe, you can get the count by using dplyr and calculating it ("on the fly") as follow:

library(dplyr)
library(ggplot2)
ggplot(reprex, aes(x=group, y=outcome, fill=..y..)) +
  stat_summary(geom = "bar", fun.y = mean) +
  stat_summary(geom = "errorbar", fun.data = mean_cl_normal, width = 0.1)+
  geom_label(inherit.aes = FALSE, data = . %>% group_by(group) %>% count(), 
            aes(label = paste0(n, " Obs."), x = group), y = -0.5)

enter image description here

Upvotes: 5

StupidWolf
StupidWolf

Reputation: 46888

You cannot use stat="count" when there's already a y variable declared.. I would say the easiest way would be to create a small dataframe for counts:

label_df = reprex %>% group_by(group) %>% summarise(outcome=mean(outcome),n=n())

Then plot using that

ggplot(reprex, aes(x=group, y=outcome, fill=..y..)) +
  stat_summary(geom = "bar", fun.y = mean) +
  stat_summary(geom = "errorbar", fun.data = mean_cl_normal, width = 0.1)+
  geom_text(data=label_df,aes(label=paste(n, "Obs.", sep=" ")), size=3.5, color="black", fontface="bold",nudge_y =1)

enter image description here

Upvotes: 2

Related Questions