airpoll_epi
airpoll_epi

Reputation: 23

How to add percentages on top of an histogram when data is grouped

This is not my data (for confidentiality reasons), but I have tried to create a reproducible example using a dataset included in the ggplot2 library. I have an histogram summarizing the value of some variable by group (factor of 2 levels). First, I did not want the counts but proportions of the total, so I used that code:

library(ggplot2)
library(dplyr)

df_example <- diamonds %>% as.data.frame() %>% filter(cut=="Premium" | cut=="Ideal")

ggplot(df_example,aes(x=z,fill=cut)) + 
  geom_histogram(aes(y=after_stat(width*density)),binwidth=1,center=0.5,col="black") +
  facet_wrap(~cut) +
  scale_x_continuous(breaks=seq(0,9,by=1)) +
  scale_y_continuous(labels=scales::percent_format(accuracy=2,suffix="")) +
  scale_fill_manual(values=c("#CC79A7","#009E73")) +
  labs(x="Depth (mm)",y="Count") +
  theme_bw() + theme(legend.position="none")

It gave me this as a result.

enter image description here

The issue is that I would like to print the numeric percentages on top of the bins and haven't find a way to do so.

As I saw it done for printing counts elsewhere, I attempted to print them using stat_bin(), including the same y and label values as the y in geom_histogram, thinking it would print the right numbers:

ggplot(df_example,aes(x=z,fill=cut)) + 
  geom_histogram(aes(y=after_stat(width*density)),binwidth=1,center=0.5,col="black") +
  stat_bin(aes(y=after_stat(width*density),label=after_stat(width*density*100)),geom="text",vjust=-.5) +
  facet_wrap(~cut) +
  scale_x_continuous(breaks=seq(0,9,by=1)) +
  scale_y_continuous(labels=scales::percent_format(accuracy=2,suffix="")) +
  scale_fill_manual(values=c("#CC79A7","#009E73")) +
  labs(x="Depth (mm)",y="%") +
  theme_bw() + theme(legend.position="none")

However, it does print way more values than there are bins, these values do not appear consistent with what is portrayed by the bar heights and they do not print in respect to vjust=-.5 which would make them appear slightly above the bars. enter image description here

What am I missing here? I know that if there was no grouping variable/facet_wrap, I could use after_stat(count/sum(count)) instead of after_stat(width*density) and it seems that it would have fixed my issue. But I need the histograms for both groups to appear next to each other. Thanks in advance!

Upvotes: 2

Views: 172

Answers (1)

stefan
stefan

Reputation: 123978

You have to use the same arguments in stat_bin as for the histogram when adding your labels to get same binning for both layers and to align the labels with the bars:

library(ggplot2)
library(dplyr)

df_example <- diamonds %>%
  as.data.frame() %>%
  filter(cut == "Premium" | cut == "Ideal")

ggplot(df_example, aes(x = z, fill = cut)) +
  geom_histogram(aes(y = after_stat(width * density)),
    binwidth = 1, center = 0.5, col = "black"
  ) +
  stat_bin(
    aes(
      y = after_stat(width * density),
      label = scales::number(after_stat(width * density), scale = 100, accuracy = 1)
    ),
    geom = "text", binwidth = 1, center = 0.5, vjust = -.25
  ) +
  facet_wrap(~cut) +
  scale_x_continuous(breaks = seq(0, 9, by = 1)) +
  scale_y_continuous(labels = scales::number_format(scale = 100)) +
  scale_fill_manual(values = c("#CC79A7", "#009E73")) +
  labs(x = "Depth (mm)", y = "%") +
  theme_bw() +
  theme(legend.position = "none")

enter image description here

Upvotes: 2

Related Questions