Reputation: 13
When using ggplot's geom_histogram() to create a histogram with bins for each month, the bins don't appear to line up with the correct calendar months, nor have the desired height.
I have searched around for a while and have still not found a satisfactory solution for plotting a histogram of monthly data with the labels and bins accurately reflecting the calendar months. I have 2 goals, and have primarily tried out the first goal, but request assistance with the second goal as well.
Goal 1: [Efficiently] Use ggplot to make a histogram with a bin for each calendar month, and have each bin's month clearly labeled
Goal 2: [Efficiently] Use ggplot to make a histogram with a bin for each calendar month, and only have the calendar quarter's clearly labeled (e.g. Oct 1 - Dec 31 is "Q4 2022")
Below I show an example with 400 observations during the period Nov 15, 2022 - Dec 31, 2023. With setting the seed, we can reproducibly establish that there are 16 observations in Nov 2022. My code shows 2 examples of plots that are attempting to use binwidth and scale_x_date(date_breaks), and neither produces a plot with the labels precisely aligned with the appropriate bin, nor with the first bin having a y-value of 16.
library(ggplot2)
set.seed(123)
# Create a dataframe with 400 random dates between Nov 15 2022 and Dec 31 2023
start_date <- as.Date("2022-11-15")
end_date <- as.Date("2023-12-31")
random_dates <- sample(seq(start_date, end_date, by="days"), 400, replace=TRUE)
df <- data.frame(Date = random_dates)
# There are 16 observations in November 2022
sum(random_dates >= as.Date("2022-11-01") & random_dates <= as.Date("2022-11-30"))
# Using binwidth = 30
ggplot(df, aes(x = Date)) +
geom_histogram(binwidth = 30, fill = "lightgrey", color = "black") +
labs(title = "binwidth = 30 (Nov 15 2022 - Dec 31 2023)",
x = "Date",
y = "Count") +
theme_minimal()
# Using date_breaks = '1 month' (and warning says that it defaults to bins = 30)
ggplot(df, aes(x = Date)) +
geom_histogram(fill = "lightgrey", color = "black") +
scale_x_date(date_breaks = "1 month", date_labels = "%b %Y") +
labs(title = "date_breaks = '1 month' (Nov 15 2022 - Dec 31 2023)",
x = "Date",
y = "Count") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
These are the produced plots:
Histogram using binwidth = 30 argument
Histogram using scale_x_date(date_breaks = "1 month")
Thank you in advance
Upvotes: 1
Views: 267
Reputation: 206546
For counts of discrete events, you want a bar chart, not a histogram. Histograms estimate the density function of continuous random variables. You can make a bar chart in this case by making factory for each month/year. Here I used lubridate
and interaction
but you could use something else if you prefer
df |>
transform(monthyear = factor(interaction(lubridate::month(Date), lubridate::year(Date)))) |>
ggplot(aes(x = monthyear)) +
geom_bar(width=1, color="white")
Upvotes: 0
Reputation: 174476
The neatest way to do this is probably to summarize your data beforehand. Find the month by using lubridate::floor_date
, then summarize to count
the number in each month. You can then turn the month column into a nicely formatted character vector and finally make that a factor so that the ordering is correct. The plot is then a very simple geom_col
:
library(tidyverse)
df %>%
mutate(month = lubridate::floor_date(Date, 'month')) %>%
count(month) %>%
mutate(month = fct_inorder(as.character(month, format = '%b %Y'))) %>%
ggplot(aes(month, n)) +
geom_col(width = 1, color = 'black', fill = 'gray') +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5))
Upvotes: 1