R function for creating uneven groups based on uneven dates

Question

I am trying to find an R function that can index groups iteratively, given a set of unevenly spaced dates, uneven group sizes, and by grouped cases. Here are example data:

> h
# A tibble: 20 x 2
      ID date      
        
 1     1 2021-01-07
 2     1 2021-01-11
 3     1 2021-01-15
 4     1 2021-01-16
 5     1 2021-01-21
 6     1 2021-01-26
 7     1 2021-02-04
 8     1 2021-02-08
 9     1 2021-02-13
10     1 2021-02-20
11     1 2021-02-23
12     1 2021-02-27
13     2 2021-01-05
14     2 2021-01-11
15     2 2021-02-02
16     2 2021-02-08
17     2 2021-02-08
18     2 2021-02-14
19     2 2021-02-17
20     2 2021-02-21

For each unique ID, I want to find the first date (chronologically) and create a group (i.e., group==1) for that case and any other rows within 7 days. For the next date after 7 days, create a second group (i.e., group==2) for that case and any others within the next 7 days. Note: the next date is not necessarily exactly 7 days after the initial date. Repeat this process for the remaining remaining cases to get the desired output:

# A tibble: 20 x 3
      ID date       group
         
 1     1 2021-01-07     1
 2     1 2021-01-11     1
 3     1 2021-01-15     2
 4     1 2021-01-16     2
 5     1 2021-01-21     2
 6     1 2021-01-26     3
 7     1 2021-02-04     4
 8     1 2021-02-08     4
 9     1 2021-02-13     5
10     1 2021-02-20     5
11     1 2021-02-23     6
12     1 2021-02-27     6
13     2 2021-01-05     1
14     2 2021-01-11     1
15     2 2021-02-02     2
16     2 2021-02-08     2
17     2 2021-02-08     2
18     2 2021-02-14     3
19     2 2021-02-17     3
20     2 2021-02-21     3

Using a rolling window function of 7 days will not work, as far as I can tell, as it will group the cases incorrectly. But I am wondering if a sort of custom rolling window function could be used? I would prefer a solution using dplyr, but other options would also work. Any help here is appreciated.

> dput(h)
structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), date = structure(c(18634, 
18638, 18642, 18643, 18648, 18653, 18662, 18666, 18671, 18678, 
18681, 18685, 18632, 18638, 18660, 18666, 18666, 18672, 18675, 
18679), class = "Date")), row.names = c(NA, -20L), class = c("tbl_df", 
"tbl", "data.frame"))

G. Grothendieck · Accepted Answer

Define a function date1 which given the first date of the group of the prior row's point and the current row's date returns the date of the start of the current group -- that must be one of the two arguments. Then grouping by ID use Reduce to apply that to the dates in each ID and convert the result to factor and then to integer.

library(dplyr)

date1 <- function(prev, x) if (x > prev + 7) x else prev
h %>% 
  group_by(ID) %>%
  mutate(group = as.integer(factor(Reduce(date1, date, acc = TRUE)))) %>%
  ungroup

giving:

# A tibble: 20 x 3
      ID date       group
         
 1     1 2021-01-07     1
 2     1 2021-01-11     1
 3     1 2021-01-15     2
 4     1 2021-01-16     2
 5     1 2021-01-21     2
 6     1 2021-01-26     3
 7     1 2021-02-04     4
 8     1 2021-02-08     4
 9     1 2021-02-13     5
10     1 2021-02-20     5
11     1 2021-02-23     6
12     1 2021-02-27     6
13     2 2021-01-05     1
14     2 2021-01-11     1
15     2 2021-02-02     2
16     2 2021-02-08     2
17     2 2021-02-08     2
18     2 2021-02-14     3
19     2 2021-02-17     3
20     2 2021-02-21     3

R function for creating uneven groups based on uneven dates

Answers (2)

Related Questions