Reputation:
I have two dataframes
data = {'process': ['buying','selling','searhicng','repairing', 'preparing', 'selling','buying', 'searching', 'selling','searching'],
'type': ['in_progress','in_progress','end','in_progress', 'end', 'in_progress','in_progress', 'end', 'in_progress','end'],
'country': ['usa',np.nan, 'usa','ghana', 'ghana','ghana','portugal', 'portugal', 'portugal','england'],
'id': ['022','022','022', '011','011', '011','011', '011', '011','011'],
'created': ['2021-07-01','2021-07-03','2021-07-04','2021-07-05','2021-07-09','2021-07-10','2021-07-15','2021-07-17','2021-07-19','2021-07-21']
}
df = pd.DataFrame(data, columns = ['process', 'type', 'country', 'id', 'created'])
df['next_created'] = df.sort_values('created').groupby('id')['created'].shift(-1)
and
data = {'id': ['022','022','022', '011','011', '011','011', '011', '011','011'],
'product': ['apple','orange','watermelow', 'qiwi','pear', 'cherry','apple', 'qiwi', 'cherry','orange'],
'created': ['2021-07-02','2021-07-06','2021-07-07','2021-07-11','2021-07-12','2021-07-13','2021-07-16','2021-07-20','2021-07-25','2021-07-26']
}
df = pd.DataFrame(data, columns = ['product', 'id','created'])
I need to merge these dataframes by condition:
in each group by id
if df1.created < df2.created < df1.next_created
How can i do it?
Ouput result:
Upvotes: 0
Views: 10686
Reputation: 23166
You don't need to create the "next_created" column. Just use merge_asof
and then merge
:
#convert the created columns to datetime if needed
df1["created"] = pd.to_datetime(df1["created"])
df2["created"] = pd.to_datetime(df2["created"])
df3 = pd.merge_asof(df2, df1, by='id', on="created")
output = df1.merge(df3.drop("created", axis=1), how="left")
>>> output
process type country id created product
0 buying in_progress usa 022 2021-07-01 apple
1 selling in_progress NaN 022 2021-07-03 NaN
2 searhicng end usa 022 2021-07-04 orange
3 searhicng end usa 022 2021-07-04 watermelow
4 repairing in_progress ghana 011 2021-07-05 NaN
5 preparing end ghana 011 2021-07-09 NaN
6 selling in_progress ghana 011 2021-07-10 qiwi
7 selling in_progress ghana 011 2021-07-10 pear
8 selling in_progress ghana 011 2021-07-10 cherry
9 buying in_progress portugal 011 2021-07-15 apple
10 searching end portugal 011 2021-07-17 NaN
11 selling in_progress portugal 011 2021-07-19 qiwi
12 searching end england 011 2021-07-21 cherry
13 searching end england 011 2021-07-21 orange
Upvotes: 2
Reputation: 620
You can do the merge on the id and then filter the rows based on the condition.
df3 = pd.merge(df_1, df_2, on='id')
df3 = df3[(df3.created_x < df3.created_y) & (df3.created_y < df3.next_created)]
Upvotes: 1