Compare timestamps based on multiple criteria from multiple rows and columns

Question

I have two data frames with timestamps (in as.POSIXct, format="%Y-%m-%d %H:%M:%S") as below.

df_ID1
 ID         DATETIME               TIMEDIFF                 EV
  A         2019-03-26 06:13:00    2019-03-26 00:13:00      1
  B         2019-04-03 08:00:00    2019-04-03 02:00:00      1
  B         2019-04-04 12:35:00    2019-04-04 06:35:00      1

df_ID0
  ID         DATETIME                   
  A         2019-03-26 00:02:00         
  A         2019-03-26 04:55:00     
  A         2019-03-26 11:22:00
  B         2019-04-02 20:43:00
  B         2019-04-04 11:03:00
  B         2019-04-06 03:12:00

I want to compare the DATETIME in df_ID1 with the DATETIME in df_ID0 that is with the same ID and the DATETIME is "smaller than but closest to" the one in df_ID1,

For the pair in two data frames that matches, I want to further compare the TIMEDIFF in df_ID1 to the matched DATETIME in df_ID0, if TIMEDIFF in df_ID1 greater than the DATETIME in df_ID0, change EV 1 to 4 in df_ID1.

My desired result is

 df_ID1
 ID         DATETIME               TIMEDIFF                 EV
  A         2019-03-26 06:13:00    2019-03-26 00:13:00      1
  B         2019-04-03 08:00:00    2019-04-03 02:00:00      4
  B         2019-04-04 12:35:00    2019-04-04 06:35:00      1

I've checked how to compare timestamps and calculate the time difference, also how to change values based on criteria... But I cannot find anything to select the "smaller than but closest to" timestamps and cannot figure out how to apply all these logic too..

Any help would be appreciate!

Ronak Shah · Accepted Answer

One option using nested mapply, is to first split df_ID1 and df_ID0 based on ID. Calculate the difference in time between each value in df_ID1 with that of df_ID0 of same ID. Get the index of "smaller than but closest to" and store it in inds and change the value to 4 if the value of corresponding TIMEDIFF column is greater than the matched DATETIME value.

df_ID1$EV[unlist(mapply(function(x, y) {
                   mapply(function(p, q) {
                     vals = as.numeric(difftime(p, y$DATETIME))
                     inds = which(vals == min(vals[vals > 0]))
                     q > y$DATETIME[inds]
                    }, x$DATETIME, x$TIMEDIFF)
                 }, split(df_ID1, df_ID1$ID), split(df_ID0, df_ID0$ID)))] <- 4


df_ID1
#  ID            DATETIME            TIMEDIFF EV
#1  A 2019-03-26 06:13:00 2019-03-26 00:13:00  1
#2  B 2019-04-03 08:00:00 2019-04-03 02:00:00  4
#3  B 2019-04-04 12:35:00 2019-04-04 06:35:00  1

data

df_ID0 <- structure(list(ID = structure(c(1L, 1L, 1L, 2L, 2L, 2L), 
.Label = c("A", 
"B"), class = "factor"), DATETIME = structure(c(1553529720, 1553547300, 
1553570520, 1554208980, 1554346980, 1554491520), class = c("POSIXct", 
"POSIXt"), tzone = "")), row.names = c(NA, -6L), class = "data.frame")

df_ID1 <- structure(list(ID = structure(c(1L, 2L, 2L), .Label = c("A", 
"B"), class = "factor"), DATETIME = structure(c(1553551980, 1554249600, 
1554352500), class = c("POSIXct", "POSIXt"), tzone = ""), TIMEDIFF = 
structure(c(1553530380, 
1554228000, 1554330900), class = c("POSIXct", "POSIXt"), tzone = ""), 
EV = c(1, 1, 1)), row.names = c(NA, -3L), class = "data.frame")

Compare timestamps based on multiple criteria from multiple rows and columns

Answers (2)

Related Questions