Reputation: 445
I have the below dataframes that I would like to merge. Merge df2 to df1.
df1 = {'ID1': ['A12', 'A13', 'A14','A15'], 'ArrivalDateTime': ["2021-09-20", "2021-09-21", "2021-09-26","2022-01-20"]}
df2 = {'ID': ['A12', 'A12', '003','A15','A15'], 'ArrivalDateCap': ["2021-09-17", "2021-09-21", "2021-09-20","2022-01-17","2022-01-19"]}
df1 = pd.DataFrame(df1)
df1["ArrivalDateTime"] = pd.to_datetime(df1["ArrivalDateTime"],format="%Y-%m-%d %H:%M")
df2 = pd.DataFrame(df2)
df2["ArrivalDateCap"] = pd.to_datetime(df2["ArrivalDateCap"],format="%Y-%m-%d %H:%M")
The common columns are ID1 in df1 and ID in df2. In addition, if they match(left join df1 and df2), I need to look for ArrivalDateCap less than but closest to the value in ArrivalDateTime for that record. That's the record from df2 that should be picked if more than one matching records exist. Sample output should be as below:
output_df = {'ID': ['A12', 'A13', 'A14','A15'], 'ArrivalDateTime': ["2021-09-20", "2021-09-21", "2021-09-26","2022-01-20"],'ID1':['A12','A13','A14','A15'],'ArrivalDateCap':['2021-09-17',np.NaN,np.NaN,'2022-01-19']}
output_df = pd.DataFrame(output_df)
Can I use pandas.merge_asof for this? If so, how can I define the ID1 and ID as well as ArrivalDateTime and ArrivalDateCap for this to work?
Upvotes: 0
Views: 34
Reputation: 862921
Use merge_asof
first:
df = pd.merge_asof(df1,
df2.sort_values('ArrivalDateCap'),
left_by='ID1',
right_by='ID',
left_on='ArrivalDateTime',
right_on='ArrivalDateCap')
print (df)
ID1 ArrivalDateTime ID ArrivalDateCap
0 A12 2021-09-20 A12 2021-09-17
1 A13 2021-09-21 NaN NaT
2 A14 2021-09-26 NaN NaT
3 A15 2022-01-20 A15 2022-01-19
And then for replace missing values use Series.fillna
:
df['ID'] = df['ID'].fillna(df['ID1'])
print (df)
ID1 ArrivalDateTime ID ArrivalDateCap
0 A12 2021-09-20 A12 2021-09-17
1 A13 2021-09-21 A13 NaT
2 A14 2021-09-26 A14 NaT
3 A15 2022-01-20 A15 2022-01-19
Upvotes: 1