rolling joins on multiple matches

Question

Hey I am not sure if i fully understand the rolling join solution specially when it comes to duplicate entries in table.

Here is a sample data :

new <- data.table( date = as.POSIXct( c( "2016-03-01 12:20:00", "2016-03-01 12:20:00", "2016-04-02 12:20:00" ) ), data.new = c( "sample1","sample2","sample3" ) )
new
old <- data.table( date = as.POSIXct( c( "2016-03-02 12:20:00", "2016-03-07 12:20:00", "2016-04-02 12:20:00", "2015-03-02 12:20:00" ) ), data.old = c( "a","b","c","d" ) )
old
setkey( new, date )
setkey( old, date )
new[old,roll=-Inf]

The output looks like :

                       date data.new data.old
1: 2015-03-02 12:20:00  sample1        d
2: 2016-03-02 12:20:00  sample3        a
3: 2016-03-07 12:20:00  sample3        b
4: 2016-04-02 12:20:00  sample3        c

I am getting the match to work properly except for the case of data.old 'd'. Since table new has first two time stamps similar, roll is just able to map to the first occurrence ideally i wanted a duplicated row for data.old "d" with data.new column as "sample2"

                      date data.new data.old
  new: 2015-03-02 12:20:00  sample2        d
    1: 2015-03-02 12:20:00  sample1        d
    2: 2016-03-02 12:20:00  sample3        a
    3: 2016-03-07 12:20:00  sample3        b
    4: 2016-04-02 12:20:00  sample3        c

Any guidance is appreciated.

Jaap · Accepted Answer

Using:

on <- old[new, roll = Inf, .(x.date, data.new, data.old)]
no <- new[old, roll = -Inf]

unique(rbindlist(list(on, no)))

gives:

                x.date data.new data.old
1: 2015-03-02 12:20:00  sample1        d
2: 2015-03-02 12:20:00  sample2        d
3: 2016-04-02 12:20:00  sample3        c
4: 2016-03-02 12:20:00  sample3        a
5: 2016-03-07 12:20:00  sample3        b

rolling joins on multiple matches

Answers (1)

Related Questions