Bart
Bart

Reputation: 13

How to create histogram with aligned monthly bins?

When using ggplot's geom_histogram() to create a histogram with bins for each month, the bins don't appear to line up with the correct calendar months, nor have the desired height.

I have searched around for a while and have still not found a satisfactory solution for plotting a histogram of monthly data with the labels and bins accurately reflecting the calendar months. I have 2 goals, and have primarily tried out the first goal, but request assistance with the second goal as well.

Goal 1: [Efficiently] Use ggplot to make a histogram with a bin for each calendar month, and have each bin's month clearly labeled

Goal 2: [Efficiently] Use ggplot to make a histogram with a bin for each calendar month, and only have the calendar quarter's clearly labeled (e.g. Oct 1 - Dec 31 is "Q4 2022")

Below I show an example with 400 observations during the period Nov 15, 2022 - Dec 31, 2023. With setting the seed, we can reproducibly establish that there are 16 observations in Nov 2022. My code shows 2 examples of plots that are attempting to use binwidth and scale_x_date(date_breaks), and neither produces a plot with the labels precisely aligned with the appropriate bin, nor with the first bin having a y-value of 16.

library(ggplot2)
set.seed(123)

# Create a dataframe with 400 random dates between Nov 15 2022 and Dec 31 2023
start_date <- as.Date("2022-11-15")
end_date <- as.Date("2023-12-31")
random_dates <- sample(seq(start_date, end_date, by="days"), 400, replace=TRUE)
df <- data.frame(Date = random_dates)

# There are 16 observations in November 2022
sum(random_dates >= as.Date("2022-11-01") & random_dates <= as.Date("2022-11-30"))

# Using binwidth = 30
ggplot(df, aes(x = Date)) +
  geom_histogram(binwidth = 30, fill = "lightgrey", color = "black") +
  labs(title = "binwidth = 30 (Nov 15 2022 - Dec 31 2023)",
       x = "Date",
       y = "Count") +
  theme_minimal()

# Using date_breaks = '1 month' (and warning says that it defaults to bins = 30)
ggplot(df, aes(x = Date)) +
  geom_histogram(fill = "lightgrey", color = "black") +
  scale_x_date(date_breaks = "1 month", date_labels = "%b %Y") +
  labs(title = "date_breaks = '1 month' (Nov 15 2022 - Dec 31 2023)",
       x = "Date",
       y = "Count") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

These are the produced plots:

Histogram using binwidth = 30 argument

Histogram using scale_x_date(date_breaks = "1 month")

Thank you in advance

Upvotes: 1

Views: 267

Answers (2)

MrFlick
MrFlick

Reputation: 206546

For counts of discrete events, you want a bar chart, not a histogram. Histograms estimate the density function of continuous random variables. You can make a bar chart in this case by making factory for each month/year. Here I used lubridate and interaction but you could use something else if you prefer

df |>
  transform(monthyear = factor(interaction(lubridate::month(Date), lubridate::year(Date)))) |>
  ggplot(aes(x = monthyear)) + 
  geom_bar(width=1, color="white")

enter image description here

Upvotes: 0

Allan Cameron
Allan Cameron

Reputation: 174476

The neatest way to do this is probably to summarize your data beforehand. Find the month by using lubridate::floor_date, then summarize to count the number in each month. You can then turn the month column into a nicely formatted character vector and finally make that a factor so that the ordering is correct. The plot is then a very simple geom_col:

library(tidyverse)

df %>%
  mutate(month = lubridate::floor_date(Date, 'month')) %>%
  count(month) %>%
  mutate(month = fct_inorder(as.character(month, format = '%b %Y'))) %>%
  ggplot(aes(month, n)) +
  geom_col(width = 1, color = 'black', fill = 'gray') +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5))

enter image description here

Upvotes: 1

Related Questions