Join DataFrame based on date which is between dates from another DataFrame

Question

I have two Pandas Data Frames. First one looks like (db1):

id   date_from   date_until   v1   v2
0    2015-06-17  2015-06-19    2    3
0    2015-06-17  2015-07-01    3    4 
1    2015-06-18  2015-06-20    5    6
1    2015-06-19  2015-07-12    5    4

and the second one (db2), like this:

id      date      v3   v4
0    2015-06-17   4    3
1    2015-06-17   5    6 
1    2015-06-18   6    4

no i want to compress them so they based on id and date. Date compression should be if db2.date is older or equal than db1.date_from and use the youngest one from db2 witch is still older or equal than db1.data_from. So compressed data should look something like this:

id   date_from   date_until      date      v1   v2   v3   v4
0    2015-06-17  2015-06-19   2015-06-17   2    3    4    3
0    2015-06-17  2015-07-01   2015-06-17   3    4    4    3
1    2015-06-18  2015-06-20   2015-06-18   5    6    6    4
1    2015-06-19  2015-07-12   2015-06-18   5    4    6    4

I can't find any solution except iterating trough all rows, but this is not good enough because my data is bigger 100k of lines. Is there any tool to do this in pandas or should i use something else?

Thank you!

Scott Boston · Accepted Answer

Let's use pd.merge_asof:

pd.merge_asof(df1,df2, left_on=['date_from'], right_on=['date'], by='id')

Output:

   id  date_from  date_until  v1  v2       date  v3  v4
0   0 2015-06-17  2015-06-19   2   3 2015-06-17   4   3
1   0 2015-06-17  2015-07-01   3   4 2015-06-17   4   3
2   1 2015-06-18  2015-06-20   5   6 2015-06-18   6   4
3   1 2015-06-19  2015-07-12   5   4 2015-06-18   6   4

Join DataFrame based on date which is between dates from another DataFrame

Answers (2)

Related Questions