Reputation: 301
UPDATE: I think this mostly does what I want, but now I have multiple matches. That is another issue I suppose. It would be nice to be able combine the rolling joins for non-equi joins.
df2[ copy(df1)[, `:=`(TargDate2 = TargDate + hours(4) , TargDate1 = TargDate -hours(4) )],
`:=`( Value = i.Value, TargDate.df1 = TargDate ),
on = .(ID == ID, TargDate >= TargDate1, TargDate <= TargDate2) ]
Is there a way to use the rolling join from the data.table package to match two data frames based on nearest datetime value within a certain constraint (e.g., 4 hours), but retain all values of the two tables (like: merge(..., all=T))?
library(data.table)
library(lubridate)
set.seed(1)
df1 <- data.frame(ID=sample(1:3,10, replace=T),TargDate=ymd_hms(Sys.time() + sort(sample(1e2:1e5, 10))),
Value=rnorm(10,10,0.5) )
set.seed(21)
df2 <- data.frame(ID=sample(1:3,20, replace=T), TargDate=ymd_hms(Sys.time() + sort(sample(1e2:1e5, 20))),
ValueMatch=rnorm(20,50,15) )
setDT(df1)
setDT(df2)
setkey(df2, ID, TargDate)[, dateMatch:=TargDate]
# This is an inner match to df1 with DateTarg and Value from df1
# and ValueMatch and dateMatch from df2
df2[df1, roll="nearest"]
# 60 seconds * 60 minutes * 4 hours
four_hours <- 60*60*4
df2[df1, roll=-four_hours]
A data frame with all rows in df1 and df2, with the matched rows merged.
Upvotes: 2
Views: 411
Reputation: 27792
Here is a data.table way to join on rows <= 4 hours of df2. Using a non-equi join on a copy of df2, where a new colun Targdate2
(=TargetDate + 4 hours) has been created to non-equi join on.
df1[ copy(df2)[, TargDate2 := TargDate + hours(4)],
`:=`( ValueMatch = i.ValueMatch, TargDate.df2 = TargDate ),
on = .(ID == ID, TargDate >= TargDate, TargDate <= TargDate2) ]
# ID TargDate Value ValueMatch TargDate.df2
# 1: 1 2019-06-05 13:32:48 10.755891 NA <NA>
# 2: 2 2019-06-05 14:21:47 10.194922 NA <NA>
# 3: 2 2019-06-05 19:11:32 9.689380 NA <NA>
# 4: 3 2019-06-05 19:18:21 8.892650 46.59552 2019-06-05 17:56:47
# 5: 1 2019-06-05 22:27:28 10.562465 NA <NA>
# 6: 3 2019-06-06 03:42:42 9.977533 22.48528 2019-06-06 03:12:42
# 7: 3 2019-06-06 04:33:36 9.991905 43.88468 2019-06-06 04:26:16
# 8: 2 2019-06-06 06:00:34 10.471918 NA <NA>
# 9: 2 2019-06-06 06:13:10 10.410611 63.67443 2019-06-06 06:10:11
#10: 1 2019-06-06 12:10:15 10.296951 51.20187 2019-06-06 08:45:39
Upvotes: 0