Kor
Kor

Reputation: 63

How do i concat dataframes without duplicates however keeping duplicates in the first dataframe

I need to add rows from DF2 to DF1, only if they don't already exist in DF1, based on colA. Basically, don't add the row into DF1 if the column value of ColA already exists.

df1 = pd.DataFrame({ 'colA': [10,10, 20, 30],
                     'colB': [100, 20, 300,54],
                     'colC': [100, 20, 300,67]})

df2 = pd.DataFrame({ 'colA': [40, 20, 50],
                     'colB': [400, 200, 500]})

this is what I have so far:

new_df = pd.concat([df1, df2]).drop_duplicates(["colA"]).reset_index(drop=True)

The reason drop_duplicates doesn't work well for me is that I need the duplicates in df1 to remain.

Upvotes: 0

Views: 94

Answers (2)

Ramlakhan Kevat
Ramlakhan Kevat

Reputation: 377

This may help you.

import pandas as pd
df1 = pd.DataFrame({ 'colA': [10,10, 20, 30],
                     'colB': [100, 20, 300,54],
                     'colC': [100, 20, 300,67]})

df2 = pd.DataFrame({ 'colA': [40, 20, 50],
                     'colB': [400, 200, 500]})

new_df = pd.concat([df1, df2], join="outer").reset_index(drop=True)
print(new_df)

Here is the output.

   colA  colB   colC
0    10   100  100.0
1    10    20   20.0
2    20   300  300.0
3    30    54   67.0
4    40   400    NaN
5    20   200    NaN
6    50   500    NaN

Upvotes: -1

Chris Adams
Chris Adams

Reputation: 18647

Use Series.isin with NOT logical operator ~ to filter the rows from df2 to append:

new_df = pd.concat([df1, df2[~df2['colA'].isin(df1['colA'])]], ignore_index=True)

[out]

   colA  colB   colC
0    10   100  100.0
1    10    20   20.0
2    20   300  300.0
3    30    54   67.0
4    40   400    NaN
5    50   500    NaN

Upvotes: 2

Related Questions