Neil
Neil

Reputation: 8247

left join in pandas with multiple records with same key

I have following dataframes in pandas

df1 (LHS)
code     date         tank       product     key
123      2019-01-01   1          HS          123_2019-01-01_1
123      2019-01-01   1          HS          123_2019-01-01_1
123      2019-01-02   2          MS          123_2019-01-02_2
123      2019-01-02   1          HS          123_2019-01-02_1

df2_master (RHS)
code     date         tank       product     key
123      2019-01-01   1          MS          123_2019-01-01_1
123      2019-01-01   1          HS          123_2019-01-01_1
123      2019-01-02   2          MS          123_2019-01-02_2
123      2019-01-02   1          HS          123_2019-01-02_1

I want to merge df1 and df2_master with left join on key. Now df2_master has 2 products associated with same key for date 2019-01-01, so I want to flag this while merging two dataframes.

My desired dataframe should look like this.

df1 (LHS)
code     date         tank       product     key                 product_df2
123      2019-01-01   1          HS          123_2019-01-01_1    More than 1 product 
123      2019-01-01   1          HS          123_2019-01-01_1    More than 1 product 
123      2019-01-02   2          MS          123_2019-01-02_2    MS
123      2019-01-02   1          HS          123_2019-01-02_1    HS

How do I do it in pandas?

Upvotes: 1

Views: 273

Answers (1)

jezrael
jezrael

Reputation: 862561

Create column product_df2 for check if duplicates by DataFrame.duplicated, merge with rows without dupes by DataFrame.drop_duplicates and last set values by numpy.where:

df2_master['product_df2'] = df2_master.duplicated(subset=['key'], keep=False)
df = df1.merge(df2_master.drop_duplicates('key'), how='left', on='key', suffixes=('','_'))
df['product_df2'] = np.where(df['product_df2'], 'More than 1 product', df['product_'])

#remove unnecessary columns
df = df.loc[:, ~df.columns.str.endswith('_')]
print (df)
   code        date  tank product               key          product_df2
0   123  2019-01-01     1      HS  123_2019-01-01_1  More than 1 product
1   123  2019-01-01     1      HS  123_2019-01-01_1  More than 1 product
2   123  2019-01-02     2      MS  123_2019-01-02_2                   MS
3   123  2019-01-02     1      HS  123_2019-01-02_1                   HS

Upvotes: 1

Related Questions