Reputation: 17
I currently am aggregating data in exactly 4-week intervals, but I need the weeks to start on Saturdays. I am using the cut.Date
line below on below's reproducible example to create a column indicating each 4-week window, but I am struggling to find a resource (in either dplyr
or lubridate
) to aggregate in exactly 4-week intervals beginning on Saturday.
Here is what I have:
library(tidyverse)
# Example
example <- structure(list(
yr = c(2016, 2016, 2016, 2016, 2016, 2016, 2016,
2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016,
2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016,
2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016
),
date = structure(
c(17054, 17055, 17056, 17057, 17058, 17059,
17060, 17061, 17062, 17063, 17064, 17065, 17066, 17067, 17068,
17069, 17070, 17071, 17072, 17073, 17074, 17075, 17076, 17077,
17078, 17079, 17080, 17081, 17082, 17083, 17084, 17085, 17086,
17087, 17088, 17089, 17090, 17091, 17092, 17093), class = "Date"),
day_of_week = structure(
c(7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L,
1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L,
2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L,
3L, 4L),
.Label = c("Sun", "Mon", "Tue", "Wed", "Thu", "Fri",
"Sat"), class = c("ordered", "factor"))),
row.names = c(NA, -40L),
class = c("tbl_df", "tbl", "data.frame"))
# Using this in a mutate to generate the week
example %>%
mutate(wk = cut.Date(date, breaks = "4 week", labels = FALSE)) %>%
print()
And I would like it to look like the following:
# Ideal output
output <- structure(list(
yr = c(2016, 2016, 2016, 2016, 2016, 2016, 2016,
2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016,
2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016,
2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016),
date = structure(
c(17054, 17055, 17056, 17057, 17058, 17059,
17060, 17061, 17062, 17063, 17064, 17065, 17066, 17067, 17068,
17069, 17070, 17071, 17072, 17073, 17074, 17075, 17076, 17077,
17078, 17079, 17080, 17081, 17082, 17083, 17084, 17085, 17086,
17087, 17088, 17089, 17090, 17091, 17092, 17093), class = "Date"),
day_of_week = structure(
c(7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L,
1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L,
2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L,
3L, 4L),
.Label = c("Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"),
class = c("ordered", "factor")),
wk = c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L)),
row.names = c(NA, -40L),
class = c("tbl_df", "tbl", "data.frame"))
Again, the first 4-week period goes from 2016-09-10 to 2016-10-08 when using Saturdays, which I need. Hit me with any questions you may have, and thanks in advance!
Upvotes: 0
Views: 520
Reputation: 160447
We'll use integer-division (%/%
) to determine how many 28-day periods occurred between "today's date" and the first "Fri"
in the data.
Up front, your output
both starts and ends on a Saturday, which doesn't seem to cycle consistently. I'll assume that you want to start each 4-week cycle on a Saturday, so it'll end on a Friday.
example$wk <- with(example, as.numeric(date - date[which.max(day_of_week == "Sat")]) %/% 28 + 1L)
print(example, n = 40)
# # A tibble: 40 x 4
# yr date day_of_week wk
# <dbl> <date> <ord> <dbl>
# 1 2016 2016-09-10 Sat 1
# 2 2016 2016-09-11 Sun 1
# 3 2016 2016-09-12 Mon 1
# 4 2016 2016-09-13 Tue 1
# 5 2016 2016-09-14 Wed 1
# 6 2016 2016-09-15 Thu 1
# 7 2016 2016-09-16 Fri 1
# 8 2016 2016-09-17 Sat 1
# 9 2016 2016-09-18 Sun 1
# 10 2016 2016-09-19 Mon 1
# 11 2016 2016-09-20 Tue 1
# 12 2016 2016-09-21 Wed 1
# 13 2016 2016-09-22 Thu 1
# 14 2016 2016-09-23 Fri 1
# 15 2016 2016-09-24 Sat 1
# 16 2016 2016-09-25 Sun 1
# 17 2016 2016-09-26 Mon 1
# 18 2016 2016-09-27 Tue 1
# 19 2016 2016-09-28 Wed 1
# 20 2016 2016-09-29 Thu 1
# 21 2016 2016-09-30 Fri 1
# 22 2016 2016-10-01 Sat 1
# 23 2016 2016-10-02 Sun 1
# 24 2016 2016-10-03 Mon 1
# 25 2016 2016-10-04 Tue 1
# 26 2016 2016-10-05 Wed 1
# 27 2016 2016-10-06 Thu 1
# 28 2016 2016-10-07 Fri 1
# 29 2016 2016-10-08 Sat 2
# 30 2016 2016-10-09 Sun 2
# 31 2016 2016-10-10 Mon 2
# 32 2016 2016-10-11 Tue 2
# 33 2016 2016-10-12 Wed 2
# 34 2016 2016-10-13 Thu 2
# 35 2016 2016-10-14 Fri 2
# 36 2016 2016-10-15 Sat 2
# 37 2016 2016-10-16 Sun 2
# 38 2016 2016-10-17 Mon 2
# 39 2016 2016-10-18 Tue 2
# 40 2016 2016-10-19 Wed 2
library(dplyr)
example %>%
mutate(
wk = as.numeric(date - date[which.max(day_of_week == "Sat")]) %/% 28 + 1L
) %>%
print(n=99)
# # A tibble: 40 x 4
# yr date day_of_week wk
# <dbl> <date> <ord> <dbl>
# 1 2016 2016-09-10 Sat 1
# 2 2016 2016-09-11 Sun 1
# 3 2016 2016-09-12 Mon 1
# 4 2016 2016-09-13 Tue 1
# 5 2016 2016-09-14 Wed 1
# 6 2016 2016-09-15 Thu 1
# 7 2016 2016-09-16 Fri 1
# 8 2016 2016-09-17 Sat 1
# 9 2016 2016-09-18 Sun 1
# 10 2016 2016-09-19 Mon 1
# 11 2016 2016-09-20 Tue 1
# 12 2016 2016-09-21 Wed 1
# 13 2016 2016-09-22 Thu 1
# 14 2016 2016-09-23 Fri 1
# 15 2016 2016-09-24 Sat 1
# 16 2016 2016-09-25 Sun 1
# 17 2016 2016-09-26 Mon 1
# 18 2016 2016-09-27 Tue 1
# 19 2016 2016-09-28 Wed 1
# 20 2016 2016-09-29 Thu 1
# 21 2016 2016-09-30 Fri 1
# 22 2016 2016-10-01 Sat 1
# 23 2016 2016-10-02 Sun 1
# 24 2016 2016-10-03 Mon 1
# 25 2016 2016-10-04 Tue 1
# 26 2016 2016-10-05 Wed 1
# 27 2016 2016-10-06 Thu 1
# 28 2016 2016-10-07 Fri 1
# 29 2016 2016-10-08 Sat 2
# 30 2016 2016-10-09 Sun 2
# 31 2016 2016-10-10 Mon 2
# 32 2016 2016-10-11 Tue 2
# 33 2016 2016-10-12 Wed 2
# 34 2016 2016-10-13 Thu 2
# 35 2016 2016-10-14 Fri 2
# 36 2016 2016-10-15 Sat 2
# 37 2016 2016-10-16 Sun 2
# 38 2016 2016-10-17 Mon 2
# 39 2016 2016-10-18 Tue 2
# 40 2016 2016-10-19 Wed 2
Upvotes: 1