Simon Page
Simon Page

Reputation: 17

Using cut.Date and Starting Weeks on Saturday in R

I currently am aggregating data in exactly 4-week intervals, but I need the weeks to start on Saturdays. I am using the cut.Date line below on below's reproducible example to create a column indicating each 4-week window, but I am struggling to find a resource (in either dplyr or lubridate) to aggregate in exactly 4-week intervals beginning on Saturday.

Here is what I have:

library(tidyverse)

# Example

example <- structure(list(
  yr = c(2016, 2016, 2016, 2016, 2016, 2016, 2016, 
         2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 
         2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 
         2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016
  ), 
  date = structure(
    c(17054, 17055, 17056, 17057, 17058, 17059, 
      17060, 17061, 17062, 17063, 17064, 17065, 17066, 17067, 17068, 
      17069, 17070, 17071, 17072, 17073, 17074, 17075, 17076, 17077, 
      17078, 17079, 17080, 17081, 17082, 17083, 17084, 17085, 17086, 
      17087, 17088, 17089, 17090, 17091, 17092, 17093), class = "Date"), 
  day_of_week = structure(
    c(7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 
      1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 
      2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 
      3L, 4L), 
    .Label = c("Sun", "Mon", "Tue", "Wed", "Thu", "Fri", 
               "Sat"), class = c("ordered", "factor"))), 
  row.names = c(NA, -40L), 
  class = c("tbl_df", "tbl", "data.frame"))



# Using this in a mutate to generate the week
example %>% 
  mutate(wk = cut.Date(date, breaks = "4 week", labels = FALSE)) %>%
  print()

And I would like it to look like the following:

# Ideal output
output <- structure(list(
  yr = c(2016, 2016, 2016, 2016, 2016, 2016, 2016, 
         2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 
         2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 
         2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016), 
  date = structure(
    c(17054, 17055, 17056, 17057, 17058, 17059, 
      17060, 17061, 17062, 17063, 17064, 17065, 17066, 17067, 17068, 
      17069, 17070, 17071, 17072, 17073, 17074, 17075, 17076, 17077, 
      17078, 17079, 17080, 17081, 17082, 17083, 17084, 17085, 17086, 
      17087, 17088, 17089, 17090, 17091, 17092, 17093), class = "Date"), 
  day_of_week = structure(
    c(7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 
      1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 
      2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 
      3L, 4L),  
    .Label = c("Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"), 
    class = c("ordered", "factor")), 
  wk = c(1L, 1L, 1L, 
         1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
         1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
         2L, 2L, 2L, 2L, 2L, 2L, 2L)), 
  row.names = c(NA, -40L),  
  class = c("tbl_df", "tbl", "data.frame"))

Again, the first 4-week period goes from 2016-09-10 to 2016-10-08 when using Saturdays, which I need. Hit me with any questions you may have, and thanks in advance!

Upvotes: 0

Views: 520

Answers (1)

r2evans
r2evans

Reputation: 160447

We'll use integer-division (%/%) to determine how many 28-day periods occurred between "today's date" and the first "Fri" in the data.

Up front, your output both starts and ends on a Saturday, which doesn't seem to cycle consistently. I'll assume that you want to start each 4-week cycle on a Saturday, so it'll end on a Friday.

base R

example$wk <- with(example, as.numeric(date - date[which.max(day_of_week == "Sat")]) %/% 28 + 1L)
print(example, n = 40)
# # A tibble: 40 x 4
#       yr date       day_of_week    wk
#    <dbl> <date>     <ord>       <dbl>
#  1  2016 2016-09-10 Sat             1
#  2  2016 2016-09-11 Sun             1
#  3  2016 2016-09-12 Mon             1
#  4  2016 2016-09-13 Tue             1
#  5  2016 2016-09-14 Wed             1
#  6  2016 2016-09-15 Thu             1
#  7  2016 2016-09-16 Fri             1
#  8  2016 2016-09-17 Sat             1
#  9  2016 2016-09-18 Sun             1
# 10  2016 2016-09-19 Mon             1
# 11  2016 2016-09-20 Tue             1
# 12  2016 2016-09-21 Wed             1
# 13  2016 2016-09-22 Thu             1
# 14  2016 2016-09-23 Fri             1
# 15  2016 2016-09-24 Sat             1
# 16  2016 2016-09-25 Sun             1
# 17  2016 2016-09-26 Mon             1
# 18  2016 2016-09-27 Tue             1
# 19  2016 2016-09-28 Wed             1
# 20  2016 2016-09-29 Thu             1
# 21  2016 2016-09-30 Fri             1
# 22  2016 2016-10-01 Sat             1
# 23  2016 2016-10-02 Sun             1
# 24  2016 2016-10-03 Mon             1
# 25  2016 2016-10-04 Tue             1
# 26  2016 2016-10-05 Wed             1
# 27  2016 2016-10-06 Thu             1
# 28  2016 2016-10-07 Fri             1
# 29  2016 2016-10-08 Sat             2
# 30  2016 2016-10-09 Sun             2
# 31  2016 2016-10-10 Mon             2
# 32  2016 2016-10-11 Tue             2
# 33  2016 2016-10-12 Wed             2
# 34  2016 2016-10-13 Thu             2
# 35  2016 2016-10-14 Fri             2
# 36  2016 2016-10-15 Sat             2
# 37  2016 2016-10-16 Sun             2
# 38  2016 2016-10-17 Mon             2
# 39  2016 2016-10-18 Tue             2
# 40  2016 2016-10-19 Wed             2

dplyr

library(dplyr)
example %>%
  mutate(
    wk = as.numeric(date - date[which.max(day_of_week == "Sat")]) %/% 28 + 1L
  ) %>%
  print(n=99)
# # A tibble: 40 x 4
#       yr date       day_of_week    wk
#    <dbl> <date>     <ord>       <dbl>
#  1  2016 2016-09-10 Sat             1
#  2  2016 2016-09-11 Sun             1
#  3  2016 2016-09-12 Mon             1
#  4  2016 2016-09-13 Tue             1
#  5  2016 2016-09-14 Wed             1
#  6  2016 2016-09-15 Thu             1
#  7  2016 2016-09-16 Fri             1
#  8  2016 2016-09-17 Sat             1
#  9  2016 2016-09-18 Sun             1
# 10  2016 2016-09-19 Mon             1
# 11  2016 2016-09-20 Tue             1
# 12  2016 2016-09-21 Wed             1
# 13  2016 2016-09-22 Thu             1
# 14  2016 2016-09-23 Fri             1
# 15  2016 2016-09-24 Sat             1
# 16  2016 2016-09-25 Sun             1
# 17  2016 2016-09-26 Mon             1
# 18  2016 2016-09-27 Tue             1
# 19  2016 2016-09-28 Wed             1
# 20  2016 2016-09-29 Thu             1
# 21  2016 2016-09-30 Fri             1
# 22  2016 2016-10-01 Sat             1
# 23  2016 2016-10-02 Sun             1
# 24  2016 2016-10-03 Mon             1
# 25  2016 2016-10-04 Tue             1
# 26  2016 2016-10-05 Wed             1
# 27  2016 2016-10-06 Thu             1
# 28  2016 2016-10-07 Fri             1
# 29  2016 2016-10-08 Sat             2
# 30  2016 2016-10-09 Sun             2
# 31  2016 2016-10-10 Mon             2
# 32  2016 2016-10-11 Tue             2
# 33  2016 2016-10-12 Wed             2
# 34  2016 2016-10-13 Thu             2
# 35  2016 2016-10-14 Fri             2
# 36  2016 2016-10-15 Sat             2
# 37  2016 2016-10-16 Sun             2
# 38  2016 2016-10-17 Mon             2
# 39  2016 2016-10-18 Tue             2
# 40  2016 2016-10-19 Wed             2

Upvotes: 1

Related Questions