Reputation: 20342
I am trying to merge two dataframes and I'm struggling to get this setup right. I Googled for a solution before posting here, but I'm still stuck. This is what I'm working with.
import pandas as pd
# Intitialise data of lists
data1 = [{'ID': 577878, 'Year':2020, 'Type': 'IB', 'Expense':6500},
{'ID': 577878, 'Year':2019, 'Type': 'IB', 'Expense':16500}]
df1 = pd.DataFrame(data1)
df1
data2 = [{'ID': 577878, 'Year':2020, 'Type': 'IB', 'Expense':23000}]
df2 = pd.DataFrame(data2)
df2
df_final = pd.merge(df1,
df2,
left_on=['ID'],
right_on=['ID'],
how='inner')
df_final
This makes sense, but I don't want the 23000 duplicated.
If I do the merge like this.
df_final = pd.merge(df1,
df2,
left_on=['ID','Year'],
right_on=['ID','Year'],
how='inner')
df_final
This also makes sense, but now the 16500 is dropped off because there is no 2019 in df2.
How can I keep both records, but not duplicate the 23000?
Upvotes: 1
Views: 56
Reputation: 23227
My interpretation is that you just don't want to see 2 entries of 23000 for both 2019 and 2020. It should be for 2020 only.
You can use outer merge (with parameter how='outer'
) on 2 columns ID
and Year
, as follows:
df_final = pd.merge(df1,
df2,
on=['ID','Year'],
how='outer')
Result:
print(df_final)
ID Year Type_x Expense_x Type_y Expense_y
0 577878 2020 IB 6500 IB 23000.0
1 577878 2019 IB 16500 NaN NaN
Upvotes: 2
Reputation: 153510
Try, column filter the df2
not to merge in that column:
df1.merge(df2[['ID', 'Year', 'Type']], on=['ID'])
Output:
ID Year_x Type_x Expense Year_y Type_y
0 577878 2020 IB 6500 2020 IB
1 577878 2019 IB 16500 2020 IB
Upvotes: 1