Reputation: 143
I am new to pandas and I want to merge two dataframe with multiple conditions applied to it. I have master_df and trade_df.
master_df
Country Product
China Apple
India Mango
Germany Apple
India Mango
master_df['Country'] can be either export or import which we can get from trade_df
trade_df
Export Import Product
India US Mango
China UK Apple
India Germany Apple
My ultimate goal is to compare master_df['Country'] to trade_df['Export','Import'] and Product of both df and get the output as
master_df
Country Product Export Import
China Apple China UK
India Mango India US
Germany Apple India Germany
I am not sure if I am getting right as I am getting error.
master_df = master_df.merge(trade_df, on =['country','Product'])
Upvotes: 0
Views: 45
Reputation: 731
When you say join on ['Country', 'Product']
, you tell pandas to find the two columns on both the dataframes. However, the columns for "country" on your two dataframes are different.
In your master_df
, it is Country
.
But in your trade_df
, it is Export
and Import
.
Also, from your expectation, it can be seen that you actually want to join master_df
twice with the trade_df
on Country = Export
and Country = Import
respectively. So, you need to join them twice and concatenate them together.
What you should do (Please note that the order of the column names in the two lists must be corresponding):
export_df = master_df.merge(trade_df, left_on=['Country', 'Product'], right_on=['Export', 'Product']).drop_duplicates()
import_df = master_df.merge(trade_df, left_on=['Country', 'Product'], right_on=['Import', 'Product']).drop_duplicates()
df = export_df.append(import_df)
print(df)
This gives you the output:
Country Product Export Import
0 China Apple China UK
1 India Mango India US
0 Germany Apple India Germany
Upvotes: 1