Gerg
Gerg

Reputation: 336

How to label stacked histogram in ggplot

I am trying to add corresponding labels to the color in the bar in a histogram. Here is a reproducible code.

ggplot(aes(displ),data =mpg) + geom_histogram(aes(fill=class),binwidth = 1,col="black")

enter image description here

This code gives a histogram and give different colors for the car "class" for the histogram bars. But is there any way I can add the labels of the "class" inside corresponding colors in the graph?

Upvotes: 1

Views: 4525

Answers (2)

Michael Harper
Michael Harper

Reputation: 15369

The inbuilt functions geom_histogram and stat_bin are perfect for quickly building plots in ggplot. However, if you are looking to do more advanced styling it is often required to create the data before you build the plot. In your case you have overlapping labels which are visually messy.

The following codes builds a binned frequency table for the dataframe:

# Subset data
mpg_df <- data.frame(displ = mpg$displ, class = mpg$class)
melt(table(mpg_df[, c("displ", "class")]))

# Bin Data
breaks <- 1
cuts <- seq(0.5, 8, breaks)
mpg_df$bin <- .bincode(mpg_df$displ, cuts)

# Count the data
mpg_df <- ddply(mpg_df, .(mpg_df$class, mpg_df$bin), nrow)
names(mpg_df) <- c("class", "bin", "Freq")

You can use this new table to set a conditional label, so boxes are only labelled if there are more than a certain number of observations:

ggplot(mpg_df, aes(x = bin, y = Freq,  fill = class)) +
  geom_bar(stat = "identity", colour = "black", width = 1) +
  geom_text(aes(label=ifelse(Freq >= 4, as.character(class), "")),
   position=position_stack(vjust=0.5), colour="black")

enter image description here

I don't think it makes a lot of sense duplicating the labels, but it may be more useful showing the frequency of each group:

ggplot(mpg_df, aes(x = bin, y = Freq,  fill = class)) +
  geom_bar(stat = "identity", colour = "black", width = 1) +
  geom_text(aes(label=ifelse(Freq >= 4, Freq, "")),
   position=position_stack(vjust=0.5), colour="black")

enter image description here

Update

I realised you can actually selectively filter a label using the internal ggplot function ..count... No need to preformat the data!

ggplot(mpg, aes(x = displ, fill = class, label = class)) +
  geom_histogram(binwidth = 1,col="black") +
  stat_bin(binwidth=1, geom="text", position=position_stack(vjust=0.5), aes(label=ifelse(..count..>4, ..count.., "")))

This post is useful for explaining special variables within ggplot: Special variables in ggplot (..count.., ..density.., etc.)

This second approach will only work if you want to label the dataset with the counts. If you want to label the dataset by the class or another parameter, you will have to prebuild the data frame using the first method.

Upvotes: 4

be_green
be_green

Reputation: 753

Looking at the examples from the other stackoverflow links you shared, all you need to do is change the vjust parameter.

ggplot(mpg, aes(x = displ, fill = class, label = class)) +
  geom_histogram(binwidth = 1,col="black") +     
  stat_bin(binwidth=1, geom="text", vjust=1.5)

enter image description here

That said, it looks like you have other issues. Namely, the labels stack on top of each other because there aren't many observations at each point. Instead I'd just let people use the legend to read the graph.

Upvotes: 2

Related Questions