Reputation: 193
I have two data frames
A:
customer application_date
1 2010-05-08
2 2012-08-08
3 2013-06-23
and B:
customer date balance
1 2009-01-02 2500
1 2009-12-24 3000
2 2011-11-11 1200
2 2012-05-20 1900
3 2013-03-21 5500
3 2013-06-05 4500
I need to merge them together so that date in table A would be closest to date in table B (by customer)
Result:
customer application_date date balance
1 2010-05-08 2009-12-24 3000
2 2012-08-08 2012-05-20 1900
3 2013-06-23 2013-06-05 4500
How could I merge those two tables correctly?
Code for example tables:
A <- data.frame(customer = c(1,2,3),
application_date = c("2010-05-08", "2012-08-08", "2013-06-23"))
B <- data.frame(customer = c(1,1,2,2,3,3),
date = c("2009-01-02", "2009-12-24", "2011-11-11", "2012-05-20", "2013-03-21", "2013-06-05"),
balance = c(2500, 3000, 1200, 1900, 5500, 4500))
Upvotes: 0
Views: 228
Reputation: 361
Here is a solution using SQL in R. SQL/HiveQL might be useful if the raw data are huge and from a database.
require(sqldf)
# convert date
A$application_date = as.Date(A$application_date)
B$date = as.Date(B$date)
# find all possible intervals for each customer
AB=sqldf("select A.customer, A.application_date, B.date, B.balance, abs(A.application_date-B.date) as diff_day
from A,B
where A.customer = B.customer")
# extract the min interval for each customer
AB_min=sqldf("select customer, min(diff_day) as min_diff_day
from AB
group by 1")
# link the min one to the customer and remove irrelevant rows
AB_match=sqldf("select AB.customer, AB.application_date, AB.date, AB.balance
from AB_min
left join AB
on AB_min.customer = AB.customer
and AB_min.min_diff_day = AB.diff_day
group by 1")
AB_match is the final output.
Upvotes: 2
Reputation: 886948
Here is one option with rolling join
library(data.table)
setDT(B)[, application_date := date]
B[A, on = .(customer, date = application_date), roll = 'nearest']
# customer date balance application_date
#1: 1 2010-05-08 3000 2009-12-24
#2: 2 2012-08-08 1900 2012-05-20
#3: 3 2013-06-23 4500 2013-06-05
A$application_date <- as.Date(A$application_date)
B$date <- as.Date(B$date)
Upvotes: 4