Reputation: 341
I have two Pandas Data Frames. First one looks like (db1
):
id date_from date_until v1 v2
0 2015-06-17 2015-06-19 2 3
0 2015-06-17 2015-07-01 3 4
1 2015-06-18 2015-06-20 5 6
1 2015-06-19 2015-07-12 5 4
and the second one (db2
), like this:
id date v3 v4
0 2015-06-17 4 3
1 2015-06-17 5 6
1 2015-06-18 6 4
no i want to compress them so they based on id
and date
. Date compression should be if db2.date
is older or equal than db1.date_from
and use the youngest one from db2
witch is still older or equal than db1.data_from
.
So compressed data should look something like this:
id date_from date_until date v1 v2 v3 v4
0 2015-06-17 2015-06-19 2015-06-17 2 3 4 3
0 2015-06-17 2015-07-01 2015-06-17 3 4 4 3
1 2015-06-18 2015-06-20 2015-06-18 5 6 6 4
1 2015-06-19 2015-07-12 2015-06-18 5 4 6 4
I can't find any solution except iterating trough all rows, but this is not good enough because my data is bigger 100k of lines. Is there any tool to do this in pandas or should i use something else?
Thank you!
Upvotes: 3
Views: 236
Reputation: 27869
Another way to do it:
import pandas as pd
import sys
if sys.version_info[0] < 3:
from StringIO import StringIO
else:
from io import StringIO
a = StringIO('''id,date_from,date_until,v1,v2
0,2015-06-17,2015-06-19,2,3
0,2015-06-17,2015-07-01,3,4
1,2015-06-18,2015-06-20,5,6
1,2015-06-19,2015-07-12,5,4''')
b = StringIO('''id,date,v3,v4
0,2015-06-17,4,3
1,2015-06-17,5,6
1,2015-06-18,6,4''')
df1 = pd.read_csv(a, sep=',')
df2 = pd.read_csv(b, sep=',')
df1.merge(df2, how='left', left_on=['id', 'date_from'], right_on=['id', 'date']).ffill()
Upvotes: 2
Reputation: 153460
Let's use pd.merge_asof
:
pd.merge_asof(df1,df2, left_on=['date_from'], right_on=['date'], by='id')
Output:
id date_from date_until v1 v2 date v3 v4
0 0 2015-06-17 2015-06-19 2 3 2015-06-17 4 3
1 0 2015-06-17 2015-07-01 3 4 2015-06-17 4 3
2 1 2015-06-18 2015-06-20 5 6 2015-06-18 6 4
3 1 2015-06-19 2015-07-12 5 4 2015-06-18 6 4
Upvotes: 3