Jeff Parker
Jeff Parker

Reputation: 1969

ggplot using grouped date variables (such as year_month)

I feel like this should be an easy task for ggplot, tidyverse, lubridate, but I cannot seem to find an elegant solution.

GOAL: Create a bar graph of my data aggregated/summarized/grouped_by year and month.

#Libraries
library(tidyverse)
library(lubridate)

# Data
date <- sample(seq(as_date('2013-06-01'), as_date('2014-5-31'), by="day"), 10000, replace = TRUE)
value <- rnorm(10000)
df <- tibble(date, value)

# Summarise
df2 <- df %>%
  mutate(year = year(date), month = month(date)) %>%
  unite(year_month,year,month) %>%
  group_by(year_month) %>%
  summarise(avg = mean(value),
            cnt = n())
# Plot
ggplot(df2) +
  geom_bar(aes(x=year_month, y = avg), stat = 'identity')

When I create the year_month variable, it naturally becomes a character variable instead of a date variable. I have also tried grouping by year(date), month(date) but then I can't figure out how to use two variables as the x-axis in ggplot. Perhaps this could be solved by flooring the dates to the first day of the month...?

Upvotes: 14

Views: 13349

Answers (1)

Jeffrey Girard
Jeffrey Girard

Reputation: 811

You were really close. The missing pieces are floor_date() and scale_x_date():

library(tidyverse)
library(lubridate)

date <- sample(seq(as_date('2013-06-01'), as_date('2014-5-31'), by = "day"),
  10000, replace = TRUE)
value <- rnorm(10000)

df <- tibble(date, value) %>% 
  group_by(month = floor_date(date, unit = "month")) %>%
  summarize(avg = mean(value))

ggplot(df, aes(x = month, y = avg)) + 
  geom_bar(stat = "identity") + 
  scale_x_date(NULL, date_labels = "%b %y", breaks = "month")

enter image description here

Upvotes: 23

Related Questions