JackJack
JackJack

Reputation: 143

Merging multiple dataframe columns with a condition

I am new to pandas and I want to merge two dataframe with multiple conditions applied to it. I have master_df and trade_df.

master_df
Country Product
China    Apple
India    Mango
Germany  Apple
India    Mango

master_df['Country'] can be either export or import which we can get from trade_df

trade_df
Export  Import   Product
India   US         Mango
China   UK         Apple
India   Germany    Apple 

My ultimate goal is to compare master_df['Country'] to trade_df['Export','Import'] and Product of both df and get the output as

master_df 
Country Product Export  Import 
China    Apple   China   UK
India    Mango   India   US
Germany  Apple   India  Germany

I am not sure if I am getting right as I am getting error.

master_df = master_df.merge(trade_df,  on =['country','Product'])

Upvotes: 0

Views: 45

Answers (1)

Christopher
Christopher

Reputation: 731

When you say join on ['Country', 'Product'], you tell pandas to find the two columns on both the dataframes. However, the columns for "country" on your two dataframes are different.

In your master_df, it is Country.

But in your trade_df, it is Export and Import.

Also, from your expectation, it can be seen that you actually want to join master_df twice with the trade_df on Country = Export and Country = Import respectively. So, you need to join them twice and concatenate them together.

What you should do (Please note that the order of the column names in the two lists must be corresponding):

export_df = master_df.merge(trade_df, left_on=['Country', 'Product'], right_on=['Export', 'Product']).drop_duplicates()
import_df = master_df.merge(trade_df, left_on=['Country', 'Product'], right_on=['Import', 'Product']).drop_duplicates()

df = export_df.append(import_df)
print(df)

This gives you the output:

    Country Product Export  Import
0   China   Apple   China   UK
1   India   Mango   India   US
0   Germany Apple   India   Germany

Upvotes: 1

Related Questions