KT6345
KT6345

Reputation: 33

How do I show all rows in DF where there are duplicate values in one column in python

I have two dataframes which I am trying to merge based on one column (df['Number and postcode']). However, that column has a number of duplicate values so the merge doesn't work (it gives this error: # Check for duplicates). The other issue is that df1['Number and postcode'] might also have a number of duplicate values. How can I solve this, please?

This is the formula I am using:

merged = pd.merge(df, df1[{'TOTAL_FLOOR_AREA', 'Bedrooms'}],how = 'inner', on = df['Number and postcode'])

Upvotes: 0

Views: 61

Answers (1)

David
David

Reputation: 1202

To see which rows are duplicates in your dataframe, you can simply use the Pandas built-in function duplicated().

df[df.duplicated(subset=['Number and postcode'], keep=False)]

Subsequently, you can drop all duplicates from both dataframes before merging using:

df.drop_duplicates(subset='Number and postcode', inplace=True)
df1.drop_duplicates(subset='Number and postcode', inplace=True)

Upvotes: 1

Related Questions