Reputation: 2024
I am trying to merge two pandas DataFrames with one of many relationship. However, there are a couple of caveats. Explanation below.
import pandas as pd
df1 = pd.DataFrame({'name': ['AA', 'BB', 'CC', 'DD'],
'col1': [1, 2, 3, 4],
'col2': [1, 2, 3, 4] })
df2 = pd.DataFrame({'name': ['AA', 'AA', 'BB', 'BB', 'CC', 'DD'],
'col3': [0, 10, np.nan, 11, 12, np.nan] })
I'd like to merge the 2 DataFrames, however, ignore the 0 and np.nan
in df2
when joining. I cannot simply filter df2
as there are other columns that I need.
Basically, I'd like to join on rows with one-to-many relationship that are not 0 or NaNs.
Expected output:
Upvotes: 1
Views: 335
Reputation: 24593
how about this :
merged_Df = df1.merge(df2.sort_values(['name','col3'], ascending=False).groupby(['name']).head(1), on='name')
output :
>>>
name col1 col2 col3
0 AA 1 1 10.0
1 BB 2 2 11.0
2 CC 3 3 12.0
3 DD 4 4 NaN
Upvotes: 1
Reputation:
One way:
>>> df1.merge(df2).drop_duplicates(subset=['name'], keep='last')
name col1 col2 col3
1 AA 1 1 10.0
3 BB 2 2 11.0
4 CC 3 3 12.0
5 DD 4 4 13.0
Upvotes: 0