paulahan
paulahan

Reputation: 17

R: How to join two data tables by nearest single time-date?

I hope someone might help me on this. I have two tables that need to be joined on the nearest date. Below are some example data:

new <- data.table( date = as.Date( c( "2016-03-02", "2016-03-04", 
                                     "2016-04-02" ) ), data.new = c( "t","u","v" ) )

old <- data.table( date = as.Date( c( "2016-03-01", "2016-03-07", 
                                     "2016-04-02", "2015-03-02" ) ), 
               data.old = c( "a","b","c","d" ) )

I am looking for some solutions using the data.table package. I've found this solution:

setkeyv(new, "date")
setkeyv(old, "date")
combined <- new[old, roll = "nearest"]

The solution looks like this:

> new
         date data.new
1: 2016-03-02        t
2: 2016-03-04        u
3: 2016-04-02        v

> old
         date data.old
1: 2016-03-01        a
2: 2016-03-07        b
3: 2016-04-02        c
4: 2015-03-02        d

> combined
         date data.new data.old
1: 2015-03-02        t        d
2: 2016-03-01        t        a
3: 2016-03-07        u        b
4: 2016-04-02        v        c

But this is not exactly what i want to achieve, see desired output below: In this particular case, dates "2015-03-02" and "2016-03-01" don't have a corresponding match in "new". Only the nearest date "2016-03-01" gets a matched value in the end, leaving the matched value for "2015-03-02" as NA.

> combined
         date data.new data.old
1: 2015-03-02        NA       d
2: 2016-03-01        t        a
3: 2016-03-07        u        b
4: 2016-04-02        v        c

I would appreciate your help! A follow-up question would be, what if there are multiple IDs? That means, what if the data need to be grouped? Thanks!

Upvotes: 0

Views: 229

Answers (1)

Peace Wang
Peace Wang

Reputation: 2419

The imperfect roll result is due to the different length of old and new. Then the solution is very intuitive. In order to let them have same size, I will rbind some row to the head of the shorter data.table new.

new <- data.table( date = as.Date( c( "2016-03-02", "2016-03-04", 
                                     "2016-04-02" ) ), data.new = c( "t","u","v" ) )

old <- data.table( date = as.Date( c( "2016-03-01", "2016-03-07", 
                                     "2016-04-02", "2015-03-02" ) ), 
               data.old = c( "a","b","c","d" ) )

head_row <- data.table(date = old[,min(date)],
                       data.new = NA)
new <- rbind(head_row,new)
setkey(new,date)
setkey(old,date)
new[old, roll = "nearest"]

Upvotes: 0

Related Questions