bbk611
bbk611

Reputation: 321

Pandas showing only the unique instances of a value in a dataframe for a given id

This is the dataframe I'm working with.

df = pd.DataFrame({'id' : ['45', '45', '45', '45', '46', '46'],
                  'description' : ['credit score too low', 'credit score too low', 'credit score too low', 'high risk of fraud', 'address not verified', 'address not verified']})
print(df)

I'm trying to modify the the dataframe such that, for a given id, there are no duplicates of a description. The dataframe below is the desired output.

newdf = pd.DataFrame({'id' : ['45', '45', '46'],
                  'description' : ['credit score too low', 'high risk of fraud', 'address not verified']})
print(newdf)

Upvotes: 1

Views: 33

Answers (1)

willeM_ Van Onsem
willeM_ Van Onsem

Reputation: 476669

You can remove the duplicates with .drop_duplicates() [pandas-doc]. For example:

>>> df
   id           description
0  45  credit score too low
1  45  credit score too low
2  45  credit score too low
3  45    high risk of fraud
4  46  address not verified
5  46  address not verified
>>> df.drop_duplicates()
   id           description
0  45  credit score too low
3  45    high risk of fraud
4  46  address not verified

You thus can set df to the new dataframe, like:

df = df.drop_duplicates()

Upvotes: 2

Related Questions