Reputation: 63
I need to add rows from DF2 to DF1, only if they don't already exist in DF1, based on colA. Basically, don't add the row into DF1 if the column value of ColA already exists.
df1 = pd.DataFrame({ 'colA': [10,10, 20, 30],
'colB': [100, 20, 300,54],
'colC': [100, 20, 300,67]})
df2 = pd.DataFrame({ 'colA': [40, 20, 50],
'colB': [400, 200, 500]})
this is what I have so far:
new_df = pd.concat([df1, df2]).drop_duplicates(["colA"]).reset_index(drop=True)
The reason drop_duplicates
doesn't work well for me is that I need the duplicates in df1 to remain.
Upvotes: 0
Views: 94
Reputation: 377
This may help you.
import pandas as pd
df1 = pd.DataFrame({ 'colA': [10,10, 20, 30],
'colB': [100, 20, 300,54],
'colC': [100, 20, 300,67]})
df2 = pd.DataFrame({ 'colA': [40, 20, 50],
'colB': [400, 200, 500]})
new_df = pd.concat([df1, df2], join="outer").reset_index(drop=True)
print(new_df)
Here is the output.
colA colB colC
0 10 100 100.0
1 10 20 20.0
2 20 300 300.0
3 30 54 67.0
4 40 400 NaN
5 20 200 NaN
6 50 500 NaN
Upvotes: -1
Reputation: 18647
Use Series.isin
with NOT logical operator ~
to filter the rows from df2
to append:
new_df = pd.concat([df1, df2[~df2['colA'].isin(df1['colA'])]], ignore_index=True)
[out]
colA colB colC
0 10 100 100.0
1 10 20 20.0
2 20 300 300.0
3 30 54 67.0
4 40 400 NaN
5 50 500 NaN
Upvotes: 2