Collapsing consecutive dates into a single row

Question

I saw many threads about how to merge consecutive dates into a single row, and tried few of them (including this and using lead from dplyr), but so far didn't find thread that specifically answers my question.

Here's what my data looks like:

df <- data.frame(
    id = c("A", "A", "A", "B", "B", "C", "C", "C"),
    start = as.Date(c("2013-05-21", "2014-03-17", "2014-12-12", "2009-03-08", 
                      "2011-07-30", "2008-10-07", "2009-11-21", "2010-12-01")),
    end = as.Date(c("2014-03-16", "2014-12-11", NA, "2011-07-14", 
                    NA, "2009-11-20", NA, NA)),
    status = c("expired", "expired", "active", "expired", 
               "active", "expired", "expired", "active")
    )

And below is my desired output:

id          start          end          status
A           2013-05-21     NA           active
B           2009-03-08     2011-07-14   expired
B           2011-07-30     NA           active
C           2008-10-07     NA           active

So what I want to do is threefold:

1) If rows are consecutive, i.e. end date + 1 is the start date of the next row, I'd like to collapse them into a single row (as in id A)

2) If rows are not consecutive, i.e. end date + 1 is NOT the start date of the next row, I'd like to keep them separate (as in id B)

3) If "expired" rows do not have end date, I'd still like to collapse them into a single row (as in id C)

Any help would be greatly appreciated!

arg0naut91 · Accepted Answer

You could go for something like:

library(tidyverse)

df %>%
  group_by(id) %>%
  mutate(
    end = if_else(is.na(end), lead(start), end),
    flag = if_else(start <= lag(end) + 1, 0, 1),
    flag = if_else(is.na(flag), 0, flag),
    group = cumsum(flag),
    flag = NULL
  ) %>%
  group_by(id, group) %>%
  mutate(
    start = first(start),
    end = last(end),
    status = last(status)
  ) %>% ungroup() %>% 
  distinct(id, start, end, status)

Output:

# A tibble: 4 x 4
  id    start      end        status 
               
1 A     2013-05-21 NA         active 
2 B     2009-03-08 2011-07-14 expired
3 B     2011-07-30 NA         active 
4 C     2008-10-07 NA         active

Collapsing consecutive dates into a single row

Answers (1)

Related Questions