Reputation: 321
This is the dataframe I'm working with.
df = pd.DataFrame({'id' : ['45', '45', '45', '45', '46', '46'],
'description' : ['credit score too low', 'credit score too low', 'credit score too low', 'high risk of fraud', 'address not verified', 'address not verified']})
print(df)
I'm trying to modify the the dataframe such that, for a given id, there are no duplicates of a description. The dataframe below is the desired output.
newdf = pd.DataFrame({'id' : ['45', '45', '46'],
'description' : ['credit score too low', 'high risk of fraud', 'address not verified']})
print(newdf)
Upvotes: 1
Views: 33
Reputation: 476669
You can remove the duplicates with .drop_duplicates()
[pandas-doc]. For example:
>>> df
id description
0 45 credit score too low
1 45 credit score too low
2 45 credit score too low
3 45 high risk of fraud
4 46 address not verified
5 46 address not verified
>>> df.drop_duplicates()
id description
0 45 credit score too low
3 45 high risk of fraud
4 46 address not verified
You thus can set df
to the new dataframe, like:
df = df.drop_duplicates()
Upvotes: 2