PSraj
PSraj

Reputation: 229

rolling joins on multiple matches

Hey I am not sure if i fully understand the rolling join solution specially when it comes to duplicate entries in table.

Here is a sample data :

new <- data.table( date = as.POSIXct( c( "2016-03-01 12:20:00", "2016-03-01 12:20:00", "2016-04-02 12:20:00" ) ), data.new = c( "sample1","sample2","sample3" ) )
new
old <- data.table( date = as.POSIXct( c( "2016-03-02 12:20:00", "2016-03-07 12:20:00", "2016-04-02 12:20:00", "2015-03-02 12:20:00" ) ), data.old = c( "a","b","c","d" ) )
old
setkey( new, date )
setkey( old, date )
new[old,roll=-Inf]

The output looks like :

                       date data.new data.old
1: 2015-03-02 12:20:00  sample1        d
2: 2016-03-02 12:20:00  sample3        a
3: 2016-03-07 12:20:00  sample3        b
4: 2016-04-02 12:20:00  sample3        c

I am getting the match to work properly except for the case of data.old 'd'. Since table new has first two time stamps similar, roll is just able to map to the first occurrence ideally i wanted a duplicated row for data.old "d" with data.new column as "sample2"

                      date data.new data.old
  new: 2015-03-02 12:20:00  sample2        d
    1: 2015-03-02 12:20:00  sample1        d
    2: 2016-03-02 12:20:00  sample3        a
    3: 2016-03-07 12:20:00  sample3        b
    4: 2016-04-02 12:20:00  sample3        c

Any guidance is appreciated.

Upvotes: 1

Views: 300

Answers (1)

Jaap
Jaap

Reputation: 83245

Using:

on <- old[new, roll = Inf, .(x.date, data.new, data.old)]
no <- new[old, roll = -Inf]

unique(rbindlist(list(on, no)))

gives:

                x.date data.new data.old
1: 2015-03-02 12:20:00  sample1        d
2: 2015-03-02 12:20:00  sample2        d
3: 2016-04-02 12:20:00  sample3        c
4: 2016-03-02 12:20:00  sample3        a
5: 2016-03-07 12:20:00  sample3        b

Upvotes: 1

Related Questions