Overlap between multiple time-intervals columns by group

Question

A few days ago I opened this related thread: Time-interval overlap match by group

However, now I have to deal with the fact that I need to overlap multiple time-intervals columns and return the first row_number value of flag = 1 when this occurs.

For instance I have the following df:

id    flag  row_number         time_1                             time_2              result
 1      1       1       2001-04-01 UTC--2001-05-01 UTC  1960-01-01 UTC--1962-01-01 UTC  NA
 1      1       2       2007-08-01 UTC--2007-12-01 UTC  1980-01-01 UTC--1982-01-01 UTC  NA
 1      1       3       2010-03-01 UTC--2011-03-01 UTC  1949-01-01 UTC--1951-01-01 UTC  NA
 1      0       4       2001-04-15 UTC--2001-04-20 UTC  1981-01-01 UTC--1983-01-01 UTC  NA
 1      0       5       2001-04-17 UTC--2001-05-15 UTC  1959-01-01 UTC--1961-01-01 UTC  1
 1      0       6       2007-09-01 UTC--2007-12-01 UTC  1980-01-01 UTC--1983-01-01 UTC  2
 1      0       7       2011-01-01 UTC--2011-03-05 UTC  1994-01-01 UTC--1996-01-01 UTC  NA
 1      0       8       2018-01-01 UTC--2017-12-01 UTC  1949-01-01 UTC--1951-01-01 UTC  NA

Created with the following code:

library(dplyr)
library(purrr)
library(lubridate)

df <- data.frame(id=c(1, 1, 1, 1, 1, 1, 1, 1),     
             flag=c(1, 1, 1, 0, 0, 0, 0, 0),
             row_number=c(1,2,3,4,5,6,7,8),
             time_1=c(interval(ymd(20010401), ymd(20010501)),
                    interval(ymd(20070801), ymd(20071201)), 
                    interval(ymd(20100301), ymd(20110301)), 
                    interval(ymd(20010415), ymd(20010420)), 
                    interval(ymd(20010417), ymd(20010515)),
                    interval(ymd(20070801), ymd(20071201)),
                    interval(ymd(20110101), ymd(20110305)),
                    interval(ymd(20180101), ymd(20171201))),
             time_2=c(interval(ymd(19600101), ymd(19620101)),
                      interval(ymd(19800101), ymd(19820101)), 
                      interval(ymd(19490101), ymd(19510101)), 
                      interval(ymd(19810101), ymd(19830101)), 
                      interval(ymd(19590101), ymd(19610101)),
                      interval(ymd(19800101), ymd(19820101)),
                      interval(ymd(19940101), ymd(19960101)),
                      interval(ymd(19490101), ymd(19510101))),
             result = c(NA, NA, NA, NA, 1, 2, NA, NA))

This is, I need to find overlaps with time_1 and time_2 of rows with flag = 0 with all time_1 and time_2 variables of rows with flag = 1.

The result should be a column with the row_number value of first match between the row with flag 0 and a row with flag 1 that has an overlapping time_1 and time_2 intervals. For this purpose I have tried int_overlap() function from lubridate package.

With this code I am able to identify if there is time_1 overlap between one row with flag = 0 to any row with flag == 1, taking advantage of map_int() function

library(tidyverse)
library(lubridate)

df %>%
  group_by(id) %>%
  mutate(value = ifelse(flag == 0, map_int(time_1, ~ any(int_overlaps(.x, time_1[flag == 1]))), NA))

A related question which can be helpful: R Find overlap among time periods

EDIT: I want to obtain a column that identifies, with row_number variable, which is the first flag 1 row that have time_1 and time_2 overlapping values with flag 0 rows.

id    flag  row_number         time_1                             time_2              result
1      1       1       2001-04-01 UTC--2001-05-01 UTC  1960-01-01 UTC--1962-01-01 UTC  NA

1      0       5       2001-04-17 UTC--2001-05-15 UTC  1959-01-01 UTC--1961-01-01 UTC  1

For instance row_number 1 and 5 fulfill the criteria. The result is an integer column which indicates that row_number 5 (flag 0 row) have time_1 and time_2 overlap with row_number 1 (flag 1).

Hope this clarifies.

chinsoon12 · Accepted Answer

Here is an option using data.table by performing overlapping joins twice:

setkey(setDT(df), id, time_1_start, time_1_end)
ol1 <- foverlaps(df, df, nomatch=0L)[
    row_number!=i.row_number & i.flag==0L & flag==1L,
    .(id, irn=i.row_number, rn=row_number, flag=i.flag, 
        time_2_start=i.time_2_start, time_2_end=i.time_2_end)]

setkey(df, id, time_2_start, time_2_end)
setkey(ol1, id, time_2_start, time_2_end)
olaps <- foverlaps(ol1, df)[row_number!=irn & row_number==rn & i.flag==0L & flag==1L, 
    .(id, irn, xrn=row_number)]

df[olaps, on=.(id, row_number=irn), res := xrn]
setorder(df, row_number)
df

output:

   id flag row_number time_1_start time_1_end time_2_start time_2_end res
1:  1    1          1   2001-04-01 2001-05-01   1960-01-01 1962-01-01  NA
2:  1    1          2   2007-08-01 2007-12-01   1980-01-01 1982-01-01  NA
3:  1    1          3   2010-03-01 2011-03-01   1949-01-01 1951-01-01  NA
4:  1    0          4   2001-04-15 2001-04-20   1981-01-01 1983-01-01  NA
5:  1    0          5   2001-04-17 2001-05-15   1959-01-01 1961-01-01   1
6:  1    0          6   2007-08-01 2007-12-01   1980-01-01 1982-01-01   2
7:  1    0          7   2011-01-01 2011-03-05   1994-01-01 1996-01-01  NA
8:  1    0          8   2017-12-01 2018-01-01   1949-01-01 1951-01-01  NA

data:

library(data.table)
dtfun <- function(x) as.IDate(x, format="%Y%m%d")
df <- data.frame(id=c(1, 1, 1, 1, 1, 1, 1, 1),     
    flag=c(1, 1, 1, 0, 0, 0, 0, 0),
    row_number=c(1,2,3,4,5,6,7,8),
    time_1_start=dtfun(c("20010401","20070801","20100301","20010415",
        "20010417","20070801","20110101","20171201")),
    time_1_end=dtfun(c("20010501","20071201","20110301","20010420","
        20010515","20071201","20110305","20180101")),
    time_2_start=dtfun(c("19600101","19800101","19490101","19810101",
        "19590101","19800101","19940101","19490101")),
    time_2_end=dtfun(c("19620101","19820101","19510101","19830101",
        "19610101","19820101","19960101","19510101")))

Overlap between multiple time-intervals columns by group

Answers (2)

Related Questions