Reputation: 77
I saw many threads about how to merge consecutive dates into a single row, and tried few of them (including this and using lead
from dplyr
), but so far didn't find thread that specifically answers my question.
Here's what my data looks like:
df <- data.frame(
id = c("A", "A", "A", "B", "B", "C", "C", "C"),
start = as.Date(c("2013-05-21", "2014-03-17", "2014-12-12", "2009-03-08",
"2011-07-30", "2008-10-07", "2009-11-21", "2010-12-01")),
end = as.Date(c("2014-03-16", "2014-12-11", NA, "2011-07-14",
NA, "2009-11-20", NA, NA)),
status = c("expired", "expired", "active", "expired",
"active", "expired", "expired", "active")
)
And below is my desired output:
id start end status
A 2013-05-21 NA active
B 2009-03-08 2011-07-14 expired
B 2011-07-30 NA active
C 2008-10-07 NA active
So what I want to do is threefold:
1) If rows are consecutive, i.e. end date + 1 is the start date of the next row, I'd like to collapse them into a single row (as in id A)
2) If rows are not consecutive, i.e. end date + 1 is NOT the start date of the next row, I'd like to keep them separate (as in id B)
3) If "expired" rows do not have end date, I'd still like to collapse them into a single row (as in id C)
Any help would be greatly appreciated!
Upvotes: 1
Views: 391
Reputation: 14764
You could go for something like:
library(tidyverse)
df %>%
group_by(id) %>%
mutate(
end = if_else(is.na(end), lead(start), end),
flag = if_else(start <= lag(end) + 1, 0, 1),
flag = if_else(is.na(flag), 0, flag),
group = cumsum(flag),
flag = NULL
) %>%
group_by(id, group) %>%
mutate(
start = first(start),
end = last(end),
status = last(status)
) %>% ungroup() %>%
distinct(id, start, end, status)
Output:
# A tibble: 4 x 4
id start end status
<fct> <date> <date> <fct>
1 A 2013-05-21 NA active
2 B 2009-03-08 2011-07-14 expired
3 B 2011-07-30 NA active
4 C 2008-10-07 NA active
Upvotes: 2