EStark
EStark

Reputation: 193

Merge two data frames by nearest date and ID

I have two data frames

A:

customer application_date
1        2010-05-08
2        2012-08-08
3        2013-06-23

and B:

customer date        balance
1        2009-01-02  2500
1        2009-12-24  3000
2        2011-11-11  1200  
2        2012-05-20  1900
3        2013-03-21  5500
3        2013-06-05  4500

I need to merge them together so that date in table A would be closest to date in table B (by customer)

Result:

customer        application_date  date        balance
1               2010-05-08        2009-12-24  3000
2               2012-08-08        2012-05-20  1900
3               2013-06-23        2013-06-05  4500

How could I merge those two tables correctly?

Code for example tables:

A <- data.frame(customer = c(1,2,3),
                application_date = c("2010-05-08", "2012-08-08", "2013-06-23"))


B <- data.frame(customer = c(1,1,2,2,3,3),
                date = c("2009-01-02", "2009-12-24", "2011-11-11", "2012-05-20", "2013-03-21", "2013-06-05"),
                balance = c(2500, 3000, 1200, 1900, 5500, 4500))

Upvotes: 0

Views: 228

Answers (2)

Fan Wang
Fan Wang

Reputation: 361

Here is a solution using SQL in R. SQL/HiveQL might be useful if the raw data are huge and from a database.

require(sqldf)

# convert date 
A$application_date = as.Date(A$application_date)
B$date = as.Date(B$date)


# find all possible intervals for each customer
AB=sqldf("select A.customer, A.application_date, B.date, B.balance, abs(A.application_date-B.date) as diff_day
      from A,B
      where A.customer = B.customer")

# extract the min interval for each customer
AB_min=sqldf("select customer, min(diff_day) as min_diff_day
          from AB
          group by 1")

# link the min one to the customer and remove irrelevant rows
AB_match=sqldf("select AB.customer, AB.application_date, AB.date, AB.balance
          from AB_min 
          left join AB
          on AB_min.customer = AB.customer
          and AB_min.min_diff_day = AB.diff_day
          group by 1")

AB_match is the final output.

Upvotes: 2

akrun
akrun

Reputation: 886948

Here is one option with rolling join

library(data.table)
setDT(B)[, application_date := date]
B[A, on = .(customer, date = application_date), roll = 'nearest']
#  customer       date balance application_date
#1:        1 2010-05-08    3000       2009-12-24
#2:        2 2012-08-08    1900       2012-05-20
#3:        3 2013-06-23    4500       2013-06-05

data

A$application_date <- as.Date(A$application_date)
B$date <- as.Date(B$date)

Upvotes: 4

Related Questions