Reputation: 33
I have two dataframes which I am trying to merge based on one column (df['Number and postcode']
). However, that column has a number of duplicate values so the merge doesn't work (it gives this error: # Check for duplicates). The other issue is that df1['Number and postcode']
might also have a number of duplicate values. How can I solve this, please?
This is the formula I am using:
merged = pd.merge(df, df1[{'TOTAL_FLOOR_AREA', 'Bedrooms'}],how = 'inner', on = df['Number and postcode'])
Upvotes: 0
Views: 61
Reputation: 1202
To see which rows are duplicates in your dataframe, you can simply use the Pandas built-in function duplicated()
.
df[df.duplicated(subset=['Number and postcode'], keep=False)]
Subsequently, you can drop all duplicates from both dataframes before merging using:
df.drop_duplicates(subset='Number and postcode', inplace=True)
df1.drop_duplicates(subset='Number and postcode', inplace=True)
Upvotes: 1