Anton
Anton

Reputation: 31

Display the total number of bin elements in a stacked histogram with ggplot2

I'd like to show data values on stacked bar chart in ggplot2. After many attempts, the only way I found to show the total amount (for each bean) is using the following code

set.seed(1234)

df <- data.frame(
  sex=factor(rep(c("F", "M"), each=200)),
  weight=round(c(rnorm(200, mean=55, sd=5), rnorm(200, mean=65, sd=5)))
)

p<-ggplot(df, aes(x=weight, fill=sex, color=sex))
p<-p + geom_histogram(position="stack", alpha=0.5, binwidth=5)

tbl <- (ggplot_build(p)$data[[1]])[, c("x", "count")]
agg <- aggregate(tbl["count"], by=tbl["x"], FUN=sum)

for(i in 1:length(agg$x))
  if(agg$count[i])
    p <- p + geom_text(x=agg$x[i], y=agg$count[i] + 1.5, label=agg$count[i], colour="black" )

which generates the following plot:

enter image description here

Is there a better (and more efficient) way to get the same result using ggplot2? Thanks a lot in advance

Upvotes: 3

Views: 3526

Answers (1)

eipi10
eipi10

Reputation: 93851

You can use stat_bin to count up the values and add text labels.

p <- ggplot(df, aes(x=weight)) +
  geom_histogram(aes(fill=sex, color=sex), 
                 position="stack", alpha=0.5, binwidth=5) +
  stat_bin(aes(y=..count.. + 2, label=..count..), geom="text", binwidth=5)

I moved the fill and color aesthetics to geom_histogram so that they would apply only to that layer and not globally to the whole plot, because we want stat_bin to generate and overall count for each bin, rather than separate counts for each level of sex. ..count.. is an internal variable returned by stat_bin that stores the counts.

enter image description here

In this case, it was straightforward to add the counts directly. However, in more complicated situations, you might sometimes want to summarise the data outside of ggplot and then feed the summary data to ggplot. Here's how you would do that in this case:

library(dplyr)

counts = df %>% group_by(weight = cut(weight, seq(30,100,5), right=FALSE)) %>%
  summarise(n = n())

countsByGroup = df %>% group_by(sex, weight = cut(weight, seq(30,100,5), right=FALSE)) %>%
  summarise(n = n())

ggplot(countsByGroup, aes(x=weight, y=n, fill=sex, color=sex)) +
  geom_bar(stat="identity", alpha=0.5, width=1) +
  geom_text(data=counts, aes(label=n, y=n+2), colour="black")

Or, you can just create countsByGroup and then create the equivalent of counts on the fly inside ggplot:

ggplot(countsByGroup, aes(x=weight, y=n, fill=sex, color=sex)) +
  geom_bar(stat="identity", alpha=0.5, width=1) +
  geom_text(data=countsByGroup %>% group_by(weight) %>% mutate(n=sum(n)), 
            aes(label=n, y=n+2), colour="black")

Upvotes: 6

Related Questions