Getting rid of duplicate strings in a column in Pandas Dataframe

Question

I have dataframe like this:

item     tags
1        awesome, awesome, great
2        cool, fun
3        boring, boring, average
4        ok, expensive

How can I remove the duplicate tags to get:

item     tags
1        awesome, great
2        cool, fun
3        boring, average
4        ok, expensive

Scott Boston · Accepted Answer

If I understand correctly, try:

df['new_tags'] = df['tags'].apply(lambda x: ', '.join(set(x.split(', '))))

Output:

   item                     tags         new_tags
0     1  awesome, awesome, great   awesome, great
1     2                cool, fun        cool, fun
2     3  boring, boring, average  average, boring
3     4            ok, expensive    expensive, ok

Getting rid of duplicate strings in a column in Pandas Dataframe

Answers (2)

Related Questions