Reputation: 229
Hey I am not sure if i fully understand the rolling join solution specially when it comes to duplicate entries in table.
Here is a sample data :
new <- data.table( date = as.POSIXct( c( "2016-03-01 12:20:00", "2016-03-01 12:20:00", "2016-04-02 12:20:00" ) ), data.new = c( "sample1","sample2","sample3" ) )
new
old <- data.table( date = as.POSIXct( c( "2016-03-02 12:20:00", "2016-03-07 12:20:00", "2016-04-02 12:20:00", "2015-03-02 12:20:00" ) ), data.old = c( "a","b","c","d" ) )
old
setkey( new, date )
setkey( old, date )
new[old,roll=-Inf]
The output looks like :
date data.new data.old
1: 2015-03-02 12:20:00 sample1 d
2: 2016-03-02 12:20:00 sample3 a
3: 2016-03-07 12:20:00 sample3 b
4: 2016-04-02 12:20:00 sample3 c
I am getting the match to work properly except for the case of data.old 'd'. Since table new has first two time stamps similar, roll is just able to map to the first occurrence ideally i wanted a duplicated row for data.old "d" with data.new column as "sample2"
date data.new data.old
new: 2015-03-02 12:20:00 sample2 d
1: 2015-03-02 12:20:00 sample1 d
2: 2016-03-02 12:20:00 sample3 a
3: 2016-03-07 12:20:00 sample3 b
4: 2016-04-02 12:20:00 sample3 c
Any guidance is appreciated.
Upvotes: 1
Views: 300
Reputation: 83245
Using:
on <- old[new, roll = Inf, .(x.date, data.new, data.old)]
no <- new[old, roll = -Inf]
unique(rbindlist(list(on, no)))
gives:
x.date data.new data.old
1: 2015-03-02 12:20:00 sample1 d
2: 2015-03-02 12:20:00 sample2 d
3: 2016-04-02 12:20:00 sample3 c
4: 2016-03-02 12:20:00 sample3 a
5: 2016-03-07 12:20:00 sample3 b
Upvotes: 1