user9292
user9292

Reputation: 1145

Group records with time interval overlap

I have a data frame (with N=16) contains ID (character), w_from (date), and w_to (date). Each record represent a task.

Here’s the data in R.

ID <- c(1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2)

w_from <- c("2010-01-01","2010-01-05","2010-01-29","2010-01-29",
            "2010-03-01","2010-03-15","2010-07-15","2010-09-10",
            "2010-11-01","2010-11-30","2010-12-15","2010-12-31",
            "2011-02-01","2012-04-01","2011-07-01","2011-07-01")

w_to <- c("2010-01-31","2010-01-15", "2010-02-13","2010-02-28",
          "2010-03-16","2010-03-16","2010-08-14","2010-10-10",
          "2010-12-01","2010-12-30","2010-12-20","2011-02-19",
          "2011-03-23","2012-06-30","2011-07-31","2011-07-06")

df <- data.frame(ID, w_from, w_to)
df$w_from <- as.Date(df$w_from)
df$w_to <- as.Date(df$w_to)

I need to generate a group number by ID for the records that their time intervals overlap. As an example, and in general terms, if record#1 overlaps with record#2, and record#2 overlaps with record#3, then record#1, record#2, and record#3 overlap.

Also, if record#1 overlaps with record#2 and record#3, but record#2 doesn't overlap with record#3, then record#1, record#2, record#3 are all overlap.

In the example above and for ID=1, the first four records overlap.

enter image description here

Here is the final output:

enter image description here

Also, if this can be done using dplyr, that would be great!

Upvotes: 0

Views: 68

Answers (1)

Edward
Edward

Reputation: 19384

Try this:

library(dplyr)
df %>% 
  group_by(ID) %>%
  arrange(w_from) %>% 
  mutate(group = 1+cumsum(
    cummax(lag(as.numeric(w_to), default = first(as.numeric(w_to)))) < as.numeric(w_from)))

# A tibble: 16 x 4
# Groups:   ID [2]
      ID w_from     w_to       group
   <dbl> <date>     <date>     <dbl>
 1     1 2010-01-01 2010-01-31     1
 2     1 2010-01-05 2010-01-15     1
 3     1 2010-01-29 2010-02-13     1
 4     1 2010-01-29 2010-02-28     1
 5     1 2010-03-01 2010-03-16     2
 6     1 2010-03-15 2010-03-16     2
 7     1 2010-07-15 2010-08-14     3
 8     1 2010-09-10 2010-10-10     4
 9     1 2010-11-01 2010-12-01     5
10     1 2010-11-30 2010-12-30     5
11     1 2010-12-15 2010-12-20     5
12     1 2010-12-31 2011-02-19     6
13     1 2011-02-01 2011-03-23     6
14     2 2011-07-01 2011-07-31     1
15     2 2011-07-01 2011-07-06     1
16     2 2012-04-01 2012-06-30     2

Upvotes: 1

Related Questions