Reputation: 125
Working in Python / pandas / dataframes
I have these two dataframes:
Dataframe one:
1 2 3
1 Stockholm 100 250
2 Stockholm 150 376
3 Stockholm 105 235
4 Stockholm 109 104
5 Burnley 145 234
6 Burnley 100 250
Dataframe two:
1 2 3
1 Stockholm 100 250
2 Stockholm 117 128
3 Stockholm 105 235
4 Stockholm 100 250
5 Burnley 145 234
6 Burnley 100 953
And I would like to find the duplicate rows found in Dataframe one and Dataframe two and remove the duplicates from Dataframe one. As in data frame two, you can find row 1, 3, 5 in data frame one, which would remove them from data frame on and create the below:
1 2 3
1 Stockholm 150 376
2 Stockholm 109 104
3 Burnley 100 250
Upvotes: 11
Views: 22298
Reputation: 651
A one-liner:
df1.merge(df2, indicator=True, how='outer').query('_merge=="left_only"').drop('_merge', axis=1)
Output:
1 2 3
2 Stockholm 150 376
4 Stockholm 109 104
6 Burnley 100 250
Upvotes: 5
Reputation: 881
Use:
df_merge = pd.merge(df1, df2, on=[1,2,3], how='inner')
df1 = df1.append(df_merge)
df1['Duplicated'] = df1.duplicated(keep=False) # keep=False marks the duplicated row with a True
df_final = df1[~df1['Duplicated']] # selects only rows which are not duplicated.
del df_final['Duplicated'] # delete the indicator column
The idea is as follows:
Each number corresponds to each line of code.
Upvotes: 11