Reputation: 8247
I have following dataframes in pandas
df1 (LHS)
code date tank product key
123 2019-01-01 1 HS 123_2019-01-01_1
123 2019-01-01 1 HS 123_2019-01-01_1
123 2019-01-02 2 MS 123_2019-01-02_2
123 2019-01-02 1 HS 123_2019-01-02_1
df2_master (RHS)
code date tank product key
123 2019-01-01 1 MS 123_2019-01-01_1
123 2019-01-01 1 HS 123_2019-01-01_1
123 2019-01-02 2 MS 123_2019-01-02_2
123 2019-01-02 1 HS 123_2019-01-02_1
I want to merge df1 and df2_master with left join on key. Now df2_master has 2 products associated with same key for date 2019-01-01, so I want to flag this while merging two dataframes.
My desired dataframe should look like this.
df1 (LHS)
code date tank product key product_df2
123 2019-01-01 1 HS 123_2019-01-01_1 More than 1 product
123 2019-01-01 1 HS 123_2019-01-01_1 More than 1 product
123 2019-01-02 2 MS 123_2019-01-02_2 MS
123 2019-01-02 1 HS 123_2019-01-02_1 HS
How do I do it in pandas?
Upvotes: 1
Views: 273
Reputation: 862561
Create column product_df2
for check if duplicates by DataFrame.duplicated
, merge with rows without dupes by DataFrame.drop_duplicates
and last set values by numpy.where
:
df2_master['product_df2'] = df2_master.duplicated(subset=['key'], keep=False)
df = df1.merge(df2_master.drop_duplicates('key'), how='left', on='key', suffixes=('','_'))
df['product_df2'] = np.where(df['product_df2'], 'More than 1 product', df['product_'])
#remove unnecessary columns
df = df.loc[:, ~df.columns.str.endswith('_')]
print (df)
code date tank product key product_df2
0 123 2019-01-01 1 HS 123_2019-01-01_1 More than 1 product
1 123 2019-01-01 1 HS 123_2019-01-01_1 More than 1 product
2 123 2019-01-02 2 MS 123_2019-01-02_2 MS
3 123 2019-01-02 1 HS 123_2019-01-02_1 HS
Upvotes: 1