Agustín Indaco
Agustín Indaco

Reputation: 580

R: merge two datasets within range of dates

I have one dataset x that looks something like this:

id  |      date
 1  |   2014-02-04
 1  |   2014-03-15
 2  |   2014-02-04
 2  |   2014-03-15

And I would like to merge it with another dataset, y, by id and date. But with date from x being same as or preceding the date in dataset y for every observation. Dataset y looks like this:

id  |      date      |   value
 1  |   2014-02-07   |    100
 2  |   2014-02-04   |     20
 2  |   2014-03-22   |     80

So I would want my final dataset to be:

id  |      date.x    |    date.y      |   value
 1  |   2014-02-04   |  2014-02-07    |    100
 1  |   2014-03-15   |                |     
 2  |   2014-02-04   |  2014-02-04    |     20
 2  |   2014-03-15   |  2014-03-22    |     80

I really do not have a lead on how to approach something like this, any help is appreciated. Thanks!

Upvotes: 0

Views: 277

Answers (2)

Vitali Avagyan
Vitali Avagyan

Reputation: 1203

Another option is to full_join by year & month.

Firstly we need to add an additional column that extracts month and year from date column:

library(zoo)
library(dplyr)

xx <- x %>% 
  mutate(y_m = as.yearmon(date))

yy <- y %>% 
  mutate(y_m = as.yearmon(date))

Then we need to fully join by id and y_m:

out <- full_join(xx,yy, by = c("id","y_m")) %>% 
  select(-y_m)

> out
# A tibble: 4 x 4
     id date.x     date.y     value
  <dbl> <date>     <date>     <dbl>
1     1 2014-02-04 2014-02-07   100
2     1 2014-03-15 NA            NA
3     2 2014-02-04 2014-02-04    20
4     2 2014-03-15 2014-03-22    80

Upvotes: 0

Wimpel
Wimpel

Reputation: 27732

This is easy in data.table, using the roll-argument

First, craete sample data with actual dates

library( data.table )

DT1 <- fread("id  |      date
1  |   2014-02-04
1  |   2014-03-15
2  |   2014-02-04
2  |   2014-03-15")

DT2 <- fread("id  |      date      |   value
 1  |   2014-02-07   |    100
 2  |   2014-02-04   |     20
 2  |   2014-03-22   |     80")

DT1[, date := as.Date( date ) ]
DT2[, date := as.Date( date ) ]

now, perform an update join on DT1, where the columns date.y and value are the result of the (left rolling) join from DT2[ DT1, .( x.date, value), on = .(id, date), roll = -Inf ].
This code joins on two columns, id and date, the roll-argument -Inf is used on the last one (i.e. date). To make sure the date-value from DT2 is returned, and not the date from DT1, we call for x.date in stead of date (which returns the date -value from DT1)

#rolling update join
DT1[, c("date.y", "value") := DT2[ DT1, .( x.date, value), on = .(id, date), roll = -Inf ]][]

#    id       date     date.y value
# 1:  1 2014-02-04 2014-02-07   100
# 2:  1 2014-03-15       <NA>    NA
# 3:  2 2014-02-04 2014-02-04    20
# 4:  2 2014-03-15 2014-03-22    80

Upvotes: 2

Related Questions