How do i concat dataframes without duplicates however keeping duplicates in the first dataframe

Question

I need to add rows from DF2 to DF1, only if they don't already exist in DF1, based on colA. Basically, don't add the row into DF1 if the column value of ColA already exists.

df1 = pd.DataFrame({ 'colA': [10,10, 20, 30],
                     'colB': [100, 20, 300,54],
                     'colC': [100, 20, 300,67]})

df2 = pd.DataFrame({ 'colA': [40, 20, 50],
                     'colB': [400, 200, 500]})

this is what I have so far:

new_df = pd.concat([df1, df2]).drop_duplicates(["colA"]).reset_index(drop=True)

The reason drop_duplicates doesn't work well for me is that I need the duplicates in df1 to remain.

Chris Adams · Accepted Answer

Use Series.isin with NOT logical operator ~ to filter the rows from df2 to append:

new_df = pd.concat([df1, df2[~df2['colA'].isin(df1['colA'])]], ignore_index=True)

[out]

   colA  colB   colC
0    10   100  100.0
1    10    20   20.0
2    20   300  300.0
3    30    54   67.0
4    40   400    NaN
5    50   500    NaN

How do i concat dataframes without duplicates however keeping duplicates in the first dataframe

Answers (2)

Related Questions