Matching timestamps in two dataframes using dplyr

Question

So I have a dataframe like this

DF1

ID    timestamp              value
1     2018-07-31 20:00:00    45555
1     2018-08-01 06:50:00    333
1     2018-08-01 07:00:00    322
2     2018-08-01 13:00:00    1222
2     2018-08-01 22:00:00    1111
3     2018-08-01 11:00:00    0
3     2018-08-02 08:00:00    22222

And another one like this

DF2

          startTime             endTime   ID    
2018-08-01 06:45:00 2018-08-01 09:49:00   1
2019-12-02 11:01:00 2019-12-02 11:02:00   2
2018-08-02 07:59:00 2018-08-02 08:50:00   2
2018-08-02 07:59:00 2018-08-02 08:50:00   3

So I want to create a new column using mutate which mutates DF2 that tells me if any time in between the startTime and endTime appears in DF1 that matches the ID.

So in DF2 if we have a date like 8/2/2018 and the time span is 7:59AM - 8:50AM , if any timestamp within this interval inclusive (2018-08-02 7:59:00, 2018-08-02 8:00:00, 2018-08-02 8:01:00...) appears in DF1 then a variable in DF2 reads 1, else 0. This is only the case if the ID matches. Does not matter how many times, atleast once is what matters.

Final version of DF2 should look like.

          startTime             endTime   ID   match
2018-08-01 06:45:00 2018-08-01 09:49:00   1    1
2019-12-02 11:01:00 2019-12-02 11:02:00   2    0
2018-08-02 07:59:00 2018-08-02 08:50:00   2    0
2018-08-02 07:59:00 2018-08-02 08:50:00   3    1

Ideally I would like to use BETWEEN() and remain in the tidyverse.

akrun · Accepted Answer

We could use a data.table non-equi join on the DateTime columns along with 'ID' column, create the match by checking if the number of rows (.N) is greater than 0

library(data.table)
DF2$match <- 0L
setDT(DF2)[DF1,  match := +(.N > 0),on = .(ID, 
        startTime < timestamp, endTime > timestamp), by = .EACHI]
DF2
#             startTime             endTime ID match
#1: 2018-08-01 06:45:00 2018-08-01 09:49:00  1     1
#2: 2019-12-02 11:01:00 2019-12-02 11:02:00  2     0
#3: 2018-08-02 07:59:00 2018-08-02 08:50:00  2     0
#4: 2018-08-02 07:59:00 2018-08-02 08:50:00  3     1

data

DF1 <- structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 3L, 3L),
 timestamp = structure(c(1533085200, 
1533124200, 1533124800, 1533146400, 1533178800, 1533139200, 1533214800
), class = c("POSIXct", "POSIXt"), tzone = ""), value = c(45555L, 
333L, 322L, 1222L, 1111L, 0L, 22222L)), row.names = c(NA, -7L
), class = "data.frame")

DF2 <- structure(list(startTime = structure(c(1533123900, 1575306060, 
1533214740, 1533214740), class = c("POSIXct", "POSIXt"), tzone = ""), 
    endTime = structure(c(1533134940, 1575306120, 1533217800, 
    1533217800), class = c("POSIXct", "POSIXt"), tzone = ""), 
    ID = c(1L, 2L, 2L, 3L)), row.names = c(NA, -4L), class = "data.frame")

Matching timestamps in two dataframes using dplyr

Answers (2)

data

Related Questions