Reputation: 580
I have one dataset x
that looks something like this:
id | date
1 | 2014-02-04
1 | 2014-03-15
2 | 2014-02-04
2 | 2014-03-15
And I would like to merge it with another dataset, y
, by id
and date
. But with date
from x
being same as or preceding the date
in dataset y
for every observation. Dataset y
looks like this:
id | date | value
1 | 2014-02-07 | 100
2 | 2014-02-04 | 20
2 | 2014-03-22 | 80
So I would want my final dataset to be:
id | date.x | date.y | value
1 | 2014-02-04 | 2014-02-07 | 100
1 | 2014-03-15 | |
2 | 2014-02-04 | 2014-02-04 | 20
2 | 2014-03-15 | 2014-03-22 | 80
I really do not have a lead on how to approach something like this, any help is appreciated. Thanks!
Upvotes: 0
Views: 277
Reputation: 1203
Another option is to full_join
by year & month.
Firstly we need to add an additional column that extracts month and year from date
column:
library(zoo)
library(dplyr)
xx <- x %>%
mutate(y_m = as.yearmon(date))
yy <- y %>%
mutate(y_m = as.yearmon(date))
Then we need to fully join by id
and y_m
:
out <- full_join(xx,yy, by = c("id","y_m")) %>%
select(-y_m)
> out
# A tibble: 4 x 4
id date.x date.y value
<dbl> <date> <date> <dbl>
1 1 2014-02-04 2014-02-07 100
2 1 2014-03-15 NA NA
3 2 2014-02-04 2014-02-04 20
4 2 2014-03-15 2014-03-22 80
Upvotes: 0
Reputation: 27732
This is easy in data.table
, using the roll
-argument
First, craete sample data with actual dates
library( data.table )
DT1 <- fread("id | date
1 | 2014-02-04
1 | 2014-03-15
2 | 2014-02-04
2 | 2014-03-15")
DT2 <- fread("id | date | value
1 | 2014-02-07 | 100
2 | 2014-02-04 | 20
2 | 2014-03-22 | 80")
DT1[, date := as.Date( date ) ]
DT2[, date := as.Date( date ) ]
now, perform an update join on DT1, where the columns date.y
and value
are the result of the (left rolling) join from DT2[ DT1, .( x.date, value), on = .(id, date), roll = -Inf ]
.
This code joins on two columns, id
and date
, the roll-argument -Inf
is used on the last one (i.e. date
). To make sure the date
-value from DT2
is returned, and not the date
from DT1
, we call for x.date
in stead of date
(which returns the date
-value from DT1)
#rolling update join
DT1[, c("date.y", "value") := DT2[ DT1, .( x.date, value), on = .(id, date), roll = -Inf ]][]
# id date date.y value
# 1: 1 2014-02-04 2014-02-07 100
# 2: 1 2014-03-15 <NA> NA
# 3: 2 2014-02-04 2014-02-04 20
# 4: 2 2014-03-15 2014-03-22 80
Upvotes: 2