Reputation: 25
I have the following dataset:
**Fruit Animal Color City**
Apple Dog Yellow Paris
Apple Dog Blue Paris
Orange Dog Green Paris
Grape Dog Pink Paris
Orange Dog Grey NY
Peach Dog Purple Rome
I would like to use pandas to remove the duplicate data in each column (not the entire row).
Example of output:
**Fruit Animal Color City**
Apple Dog Yellow Paris
Grape Paris NY
Orange Green Rome
Peach Pink
Grey
Purple
Regards,
Upvotes: 2
Views: 71
Reputation: 2811
you can try column by column using drop_duplicates
:
for x in df.columns:
df[x] = df[x].drop_duplicates().reset_index(drop=True)
#output:
Fruit Animal Color City
0 Apple Dog Yellow Paris
1 Orange NaN Blue NY
2 Grape NaN Green Rome
3 Peach NaN Pink NaN
4 NaN NaN Grey NaN
5 NaN NaN Purple NaN
Upvotes: 0
Reputation: 323226
We can do unique
s=df.T.apply(pd.Series.unique,1)
newdf=pd.DataFrame(s.tolist(),index=s.index).T
newdf
Out[57]:
**Fruit Animal Color City**
0 Apple Dog Yellow Paris
1 Orange None Blue NY
2 Grape None Green Rome
3 Peach None Pink None
4 None None Grey None
5 None None Purple None
Upvotes: 1