Reputation: 367
I'm working with the following data from 2019 to 2022
library(tidyverse)
library(lubridate)
set.seed(123)
df <-
tibble(
date = sample(
seq(as.Date('2019/01/01'), as.Date('2022/09/01'), by = "day"),
size = 100,
replace = T))
I would like to create a new variable that summarises each week into the same month-day range across 2019-2022. The result should be the week staring the same month-day regardless of what weekday or week of the year that date happens to be each year.
For example
I've tried using cat.Date()
and lubridate::floor_date()
but these return the floor date for that particular year. For example
df %>%
mutate(
date_month_week = lubridate::floor_date(date, unit = "week", week_start = 7),
date_month_week = format(as.Date(date_month_week), "%m-%d")) %>%
arrange(desc(date_month_week)) %>%
distinct(date_month_week)
Yields
# A tibble: 77 × 1
date_month_week
<chr>
1 12-20
2 12-15
3 12-12
4 12-08
5 12-06
6 12-01
7 11-28
8 11-24
9 11-22
10 11-21
Many thanks for your help.
Upvotes: 1
Views: 93
Reputation: 367
I've developed a kludge solution, but would love hear a more elegant alternative.
We can create a separate df that matches every month-day to a corresponding "Week of %b-%d"
ls_groups <-
tibble(
date = seq(
as.Date('2018/01/01'), # 2018 starts on a Monday
as.Date('2018/12/31'), # Dec 31 is week 53, need to fix later
by = "day"),
month_day = format(date, "%Y-%m-%d"),
group = sort(c(rep(seq(1,52), 7), 53))) %>% # Add week 53 manually
group_by(group) %>%
mutate(
group = min(month_day), # Take the minimum date so it's always starting on Monday
group = format(as.Date(group), "Week start %b-%d"), # Convert to text format
month_day = format(date, "%m-%d")) %>% # Create a join key
select(-date)
Then we join it into the original data frame
df %>%
mutate(month_day = format(date, "%m-%d")) %>% # Create a join key
left_join(
x = .,
y = ls_groups,
by = "month_day") %>%
select(-month_day) %>%
arrange(group) # Arrange to check that diff. years have same month-day
Which gives us
# A tibble: 100 × 2
date group
<date> <chr>
1 2020-04-07 Week start Apr-02
2 2021-04-19 Week start Apr-16
3 2022-04-16 Week start Apr-16
4 2021-04-23 Week start Apr-23
5 2020-04-24 Week start Apr-23
6 2020-05-04 Week start Apr-30
7 2019-05-01 Week start Apr-30
8 2021-08-10 Week start Aug-06
9 2022-08-06 Week start Aug-06
10 2020-08-10 Week start Aug-06
# … with 90 more rows
# ℹ Use `print(n = ...)` to see more rows
Upvotes: 1