kms
kms

Reputation: 2024

Pandas conditional merge 2 dataframes with one to many relationship

I am trying to merge two pandas DataFrames with one of many relationship. However, there are a couple of caveats. Explanation below.

import pandas as pd

df1 = pd.DataFrame({'name': ['AA', 'BB', 'CC', 'DD'],
                    'col1': [1, 2, 3, 4],
                    'col2': [1, 2, 3, 4] })

df2 = pd.DataFrame({'name': ['AA', 'AA', 'BB', 'BB', 'CC', 'DD'],
                    'col3': [0, 10, np.nan, 11, 12, np.nan] })

I'd like to merge the 2 DataFrames, however, ignore the 0 and np.nan in df2 when joining. I cannot simply filter df2 as there are other columns that I need.

Basically, I'd like to join on rows with one-to-many relationship that are not 0 or NaNs.

Expected output:

dd

Upvotes: 1

Views: 335

Answers (2)

eshirvana
eshirvana

Reputation: 24593

how about this :

merged_Df = df1.merge(df2.sort_values(['name','col3'], ascending=False).groupby(['name']).head(1), on='name')

output :

>>>
  name  col1  col2  col3
0   AA     1     1  10.0
1   BB     2     2  11.0
2   CC     3     3  12.0
3   DD     4     4   NaN

Upvotes: 1

user17242583
user17242583

Reputation:

One way:

>>> df1.merge(df2).drop_duplicates(subset=['name'], keep='last')
  name  col1  col2  col3
1   AA     1     1  10.0
3   BB     2     2  11.0
4   CC     3     3  12.0
5   DD     4     4  13.0

Upvotes: 0

Related Questions