user15920209
user15920209

Reputation:

Pandas merge by condition

I have two dataframes

data = {'process': ['buying','selling','searhicng','repairing', 'preparing', 'selling','buying', 'searching', 'selling','searching'],
        'type': ['in_progress','in_progress','end','in_progress', 'end', 'in_progress','in_progress', 'end', 'in_progress','end'],
        'country': ['usa',np.nan, 'usa','ghana', 'ghana','ghana','portugal', 'portugal', 'portugal','england'],
        'id': ['022','022','022', '011','011', '011','011', '011', '011','011'],
        'created': ['2021-07-01','2021-07-03','2021-07-04','2021-07-05','2021-07-09','2021-07-10','2021-07-15','2021-07-17','2021-07-19','2021-07-21']        
}

df = pd.DataFrame(data, columns = ['process', 'type', 'country', 'id', 'created'])
df['next_created'] = df.sort_values('created').groupby('id')['created'].shift(-1)

enter image description here

and

data = {'id': ['022','022','022', '011','011', '011','011', '011', '011','011'],
        'product': ['apple','orange','watermelow', 'qiwi','pear', 'cherry','apple', 'qiwi', 'cherry','orange'],
        'created': ['2021-07-02','2021-07-06','2021-07-07','2021-07-11','2021-07-12','2021-07-13','2021-07-16','2021-07-20','2021-07-25','2021-07-26']        
}
df = pd.DataFrame(data, columns = ['product', 'id','created'])

enter image description here

I need to merge these dataframes by condition: in each group by id if df1.created < df2.created < df1.next_created

How can i do it?

Ouput result:

enter image description here

Upvotes: 0

Views: 10686

Answers (2)

not_speshal
not_speshal

Reputation: 23166

You don't need to create the "next_created" column. Just use merge_asof and then merge:

#convert the created columns to datetime if needed
df1["created"] = pd.to_datetime(df1["created"])
df2["created"] = pd.to_datetime(df2["created"])

df3 = pd.merge_asof(df2, df1, by='id', on="created")
output = df1.merge(df3.drop("created", axis=1), how="left")

>>> output
      process         type   country   id    created     product
0      buying  in_progress       usa  022 2021-07-01       apple
1     selling  in_progress       NaN  022 2021-07-03         NaN
2   searhicng          end       usa  022 2021-07-04      orange
3   searhicng          end       usa  022 2021-07-04  watermelow
4   repairing  in_progress     ghana  011 2021-07-05         NaN
5   preparing          end     ghana  011 2021-07-09         NaN
6     selling  in_progress     ghana  011 2021-07-10        qiwi
7     selling  in_progress     ghana  011 2021-07-10        pear
8     selling  in_progress     ghana  011 2021-07-10      cherry
9      buying  in_progress  portugal  011 2021-07-15       apple
10  searching          end  portugal  011 2021-07-17         NaN
11    selling  in_progress  portugal  011 2021-07-19        qiwi
12  searching          end   england  011 2021-07-21      cherry
13  searching          end   england  011 2021-07-21      orange

Upvotes: 2

Sonia Samipillai
Sonia Samipillai

Reputation: 620

You can do the merge on the id and then filter the rows based on the condition.

df3 = pd.merge(df_1, df_2, on='id')
df3 = df3[(df3.created_x < df3.created_y) & (df3.created_y < df3.next_created)]

Upvotes: 1

Related Questions