Consolidate multiple time intervals within a group unit (dplyr or data.table solution preferred)

Question

I have a tricky little coding issue I was hoping someone might have a solution for.

I essentially have a very large dataset of stays (person ids, admits, discharges), >10 million.

library(dplyr)
library(lubridate)

dat <- read.csv(text="
personid, start, end
1, 2017-09-01, 2017-10-01
1, 2017-10-05, 2017-10-07
2, 2017-10-21, 2017-11-01
3, 2017-12-01, 2017-12-15
3, 2017-12-27, 2017-12-31") %>%
  transmute(
    personid,
    start = ymd(start), 
    end = ymd(end))

Each stay is non-overlapping, but we have a logic rule where if the stays are within 10 days of each other we want to consolidate them as one (i.e keep the earlier admit and the later discharge). So that the final dataset are unique stays at least 10 days from each other.

e.g.:

1, 2017-09-01, 2017-10-07
2, 2017-10-21, 2017-11-01
3, 2017-12-01, 2017-12-15
3, 2017-12-27, 2017-12-31

There are a few posts about overlapping intervals, but this is a little different: Consolidating set of time intervals, chains of intervals to single interval I think overly complex compared to what I need.

I was also hoping for a dplyr or data.table solution, though the group_by statement takes quite a while.

Consolidate multiple time intervals within a group unit (dplyr or data.table solution preferred)

Answers (1)

Related Questions