Flag consecutive dates by group - R

Question

Below is an example of my data (Room and Date). I would like to generate variables Goal1 , Goal2 and Goal3. Every time there is a gap in the Date variable means that the room was closed. My goal is to identify consecutive dates by room.

  Room    Date         Goal1     Goal2       Goal3
1 Upper A 2021-01-01   1         2021-01-01  2021-01-02
2 Upper A 2021-01-02   1         2021-01-01  2021-01-02
3 Upper A 2021-01-05   2         2021-01-05  2021-01-05
4 Upper A 2021-01-10   3         2021-01-10  2021-01-10
5 Upper B 2021-01-01   1         2021-01-01  2021-01-01
6 Upper B 2021-02-05   2         2021-02-05  2021-02-07
7 Upper B 2021-02-06   2         2021-02-05  2021-02-07
8 Upper B 2021-02-07   2         2021-02-05  2021-02-07

df <- data.frame("Area" = c("Upper A", "Upper A", "Upper A", "Upper A",
                            "Upper B", "Upper B", "Upper B", "Upper B"),
                "Date" = c("1/1/2021", "1/2/2021", "1/5/2021", "1/10/2021",
                           "1/1/2021", "2/5/2021", "2/6/2021", "2/7/2021"))
df$Date <- as.Date(df$Date, format = "%m/%d/%Y")

Thank you, Marvin

stribstrib · Accepted Answer

# Original Data (Note I use a different method to convert the Date to date format below)
df <- data.frame("Area" = c("Upper A", "Upper A", "Upper A", "Upper A",
                                "Upper B", "Upper B", "Upper B", "Upper B"),
                    "Date" = c("1/1/2021", "1/2/2021", "1/5/2021", "1/10/2021",
                               "1/1/2021", "2/5/2021", "2/6/2021", "2/7/2021"))

Here's one possible solution. I created an extra column with a nested if_else() statement that identifies the start date of each 'group' of consecutive dates. I left the extra column in the final data set to better illustrate what's happening in the code.

library(lubridate) # I suggest lubridate for working with dates
# It sticks with the dplyr/tidyverse syntax
    
df.grouped <- df %>% 
  mutate(Date = mdy(Date)) %>% #convert characters to actual dates in month-day-year format
  arrange(Area, Date) %>% # arrange data in order by area, then Date
  group_by(Area) %>% # group by Area
  mutate(group_start = if_else(row_number() == 1, 1, #group_start gives the start of consecutive groups of days a 1, other dates a 0
                            if_else(Date-lag(Date) == 1, 0, 1)),
         group_id = cumsum(group_start)) %>%  #group_id cumulatively adds the group_start column, effectively generating a new id # for each group start day
  group_by(Area, group_id) %>% # re-group the data by Area AND group_id
  mutate(start_date = min(Date), #find the min (start) and max (end) dates for each group
         end_date = max(Date))

final result:

df.grouped

> df.grouped
# A tibble: 8 x 6
# Groups:   Area, group_id [5]
  Area    Date       group_start group_id start_date end_date  
                              
1 Upper A 2021-01-01           1        1 2021-01-01 2021-01-02
2 Upper A 2021-01-02           0        1 2021-01-01 2021-01-02
3 Upper A 2021-01-05           1        2 2021-01-05 2021-01-05
4 Upper A 2021-01-10           1        3 2021-01-10 2021-01-10
5 Upper B 2021-01-01           1        1 2021-01-01 2021-01-01
6 Upper B 2021-02-05           1        2 2021-02-05 2021-02-07
7 Upper B 2021-02-06           0        2 2021-02-05 2021-02-07
8 Upper B 2021-02-07           0        2 2021-02-05 2021-02-07

Flag consecutive dates by group - R

Answers (2)

Related Questions