Cameron
Cameron

Reputation: 367

How to aggregate dates by "Week starting July 01" "Week starting July 08" etc over different years?

Data

I'm working with the following data from 2019 to 2022

library(tidyverse)
library(lubridate)

set.seed(123)

df <- 
  tibble(
    date = sample(
      seq(as.Date('2019/01/01'), as.Date('2022/09/01'), by = "day"), 
      size = 100, 
      replace = T))

Desired output

I would like to create a new variable that summarises each week into the same month-day range across 2019-2022. The result should be the week staring the same month-day regardless of what weekday or week of the year that date happens to be each year.

For example

Failed attempt

I've tried using cat.Date() and lubridate::floor_date() but these return the floor date for that particular year. For example

df %>% 
  mutate(
    date_month_week = lubridate::floor_date(date, unit = "week", week_start = 7),
    date_month_week = format(as.Date(date_month_week), "%m-%d")) %>% 
  arrange(desc(date_month_week)) %>% 
  distinct(date_month_week)

Yields

# A tibble: 77 × 1
   date_month_week
   <chr>          
 1 12-20          
 2 12-15          
 3 12-12          
 4 12-08          
 5 12-06          
 6 12-01          
 7 11-28          
 8 11-24          
 9 11-22          
10 11-21 

Many thanks for your help.

Upvotes: 1

Views: 93

Answers (1)

Cameron
Cameron

Reputation: 367

I've developed a kludge solution, but would love hear a more elegant alternative.

We can create a separate df that matches every month-day to a corresponding "Week of %b-%d"

ls_groups <- 
  tibble(
    date = seq(
      as.Date('2018/01/01'), # 2018 starts on a Monday
      as.Date('2018/12/31'), # Dec 31 is week 53, need to fix later
      by = "day"),
    month_day = format(date, "%Y-%m-%d"), 
    group = sort(c(rep(seq(1,52), 7), 53))) %>% # Add week 53 manually
  group_by(group) %>% 
  mutate(
    group = min(month_day), # Take the minimum date so it's always starting on Monday
    group = format(as.Date(group), "Week start %b-%d"), # Convert to text format
    month_day = format(date, "%m-%d")) %>%   # Create a join key
  select(-date)

Then we join it into the original data frame

df %>% 
  mutate(month_day = format(date, "%m-%d")) %>% # Create a join key
  left_join(
    x = ., 
    y = ls_groups, 
    by = "month_day") %>% 
  select(-month_day) %>% 
  arrange(group) # Arrange to check that diff. years have same month-day

Which gives us

# A tibble: 100 × 2
   date       group            
   <date>     <chr>            
 1 2020-04-07 Week start Apr-02
 2 2021-04-19 Week start Apr-16
 3 2022-04-16 Week start Apr-16
 4 2021-04-23 Week start Apr-23
 5 2020-04-24 Week start Apr-23
 6 2020-05-04 Week start Apr-30
 7 2019-05-01 Week start Apr-30
 8 2021-08-10 Week start Aug-06
 9 2022-08-06 Week start Aug-06
10 2020-08-10 Week start Aug-06
# … with 90 more rows
# ℹ Use `print(n = ...)` to see more rows

Upvotes: 1

Related Questions