ASH
ASH

Reputation: 20342

How can we handle dupes in a merged dataframe?

I am trying to merge two dataframes and I'm struggling to get this setup right. I Googled for a solution before posting here, but I'm still stuck. This is what I'm working with.

import pandas as pd
 
# Intitialise data of lists
data1 = [{'ID': 577878, 'Year':2020, 'Type': 'IB', 'Expense':6500}, 
         {'ID': 577878, 'Year':2019, 'Type': 'IB', 'Expense':16500}]
df1 = pd.DataFrame(data1)
df1

data2 = [{'ID': 577878, 'Year':2020, 'Type': 'IB', 'Expense':23000}]
df2 = pd.DataFrame(data2)
df2


df_final = pd.merge(df1, 
          df2,
          left_on=['ID'],
          right_on=['ID'],
          how='inner')
df_final

enter image description here

This makes sense, but I don't want the 23000 duplicated.

If I do the merge like this.

df_final = pd.merge(df1, 
          df2,
          left_on=['ID','Year'],
          right_on=['ID','Year'],
          how='inner')
df_final

enter image description here

This also makes sense, but now the 16500 is dropped off because there is no 2019 in df2.

How can I keep both records, but not duplicate the 23000?

Upvotes: 1

Views: 56

Answers (2)

SeaBean
SeaBean

Reputation: 23227

My interpretation is that you just don't want to see 2 entries of 23000 for both 2019 and 2020. It should be for 2020 only.

You can use outer merge (with parameter how='outer') on 2 columns ID and Year, as follows:

df_final =  pd.merge(df1, 
                     df2,
                     on=['ID','Year'],
                     how='outer')

Result:

print(df_final)

       ID  Year Type_x  Expense_x Type_y  Expense_y
0  577878  2020     IB       6500     IB    23000.0
1  577878  2019     IB      16500    NaN        NaN

Upvotes: 2

Scott Boston
Scott Boston

Reputation: 153510

Try, column filter the df2 not to merge in that column:

df1.merge(df2[['ID', 'Year', 'Type']], on=['ID'])

Output:

       ID  Year_x Type_x  Expense  Year_y Type_y
0  577878    2020     IB     6500    2020     IB
1  577878    2019     IB    16500    2020     IB

Upvotes: 1

Related Questions