Reputation: 33
I can't figure out how to merge 2 pandas DataFrames with certain conditions.
I have this:
import pandas as pd
df1 = pd.DataFrame({'Name': ['A', 'A', 'A', 'B' ,'B' ,'B'],
'Date': ['10/15/2019', '10/16/2019', '10/17/2019', '10/15/2019', '10/16/2019', '10/17/2019'],
'Value 1': [101, 102, 103, 201, 202, 203],
'Value 2': ['A1', 'A2', 'A3', 'B1', 'B2', 'B3']})
df2 = pd.DataFrame({'Name': ['A', 'A', 'B', 'B' ,'C'],
'Date': ['10/14/2019', '10/15/2019', '10/13/2019', '10/18/2019', '10/18/2019'],
'Value 3': [2, 2, 22, 44, 222]})
I need to get to this
I need merge only names that exist in df1, but add dates that exist in df2. However, those dates should be greater than 10/13/2019.
The closest I could get is
df_m = pd.merge(df1, df2, how='outer', left_on=['Name', 'Date'], right_on=['Name', 'Date'], sort=True)
But this includes everything ('C' from df2 - I don't need it and date 10/13/2019 for 'B' - I don't need it either).
I'd really appreciate any help.
Edited: We can drop the condition of dates being greater than 10/13/2019 - I figured this out by just created another df2_mod where I didn't include dates I don't need. However, some dates that exist in df2_mod are not in df1 and I need to merge them with values 3. However I don't need Names from df2 into df1 that don't exist in df1
Upvotes: 0
Views: 73
Reputation: 195593
Here is one solution (I don't filter for dates being greater than 10/13/2019, as stated in the updated question):
df2
and drop all 'Names'
that don't exist in df1
df1
and df2_filtered
on Name and Date columnsimport pandas as pd
df1 = pd.DataFrame({'Name': ['A', 'A', 'A', 'B' ,'B' ,'B'],
'Date': ['10/15/2019', '10/16/2019', '10/17/2019', '10/15/2019', '10/16/2019', '10/17/2019'],
'Value 1': [101, 102, 103, 201, 202, 203],
'Value 2': ['A1', 'A2', 'A3', 'B1', 'B2', 'B3']})
df2 = pd.DataFrame({'Name': ['A', 'A', 'B', 'B' ,'C'],
'Date': ['10/14/2019', '10/15/2019', '10/13/2019', '10/18/2019', '10/18/2019'],
'Value 3': [2, 2, 22, 44, 222]})
df2_filtered = df2[df2['Name'].isin(df1['Name'])] # we want df2 rows with `Names` that exists in df1
print(df1.merge(df2_filtered, on=['Name', 'Date'], how='outer').sort_values(['Name', 'Date']).reset_index(drop=True))
Prints:
Name Date Value 1 Value 2 Value 3
0 A 10/14/2019 NaN NaN 2.0
1 A 10/15/2019 101.0 A1 2.0
2 A 10/16/2019 102.0 A2 NaN
3 A 10/17/2019 103.0 A3 NaN
4 B 10/13/2019 NaN NaN 22.0
5 B 10/15/2019 201.0 B1 NaN
6 B 10/16/2019 202.0 B2 NaN
7 B 10/17/2019 203.0 B3 NaN
8 B 10/18/2019 NaN NaN 44.0
Upvotes: 1