edyvedy13
edyvedy13

Reputation: 2296

Getting rid of duplicate strings in a column in Pandas Dataframe

I have dataframe like this:

item     tags
1        awesome, awesome, great
2        cool, fun
3        boring, boring, average
4        ok, expensive

How can I remove the duplicate tags to get:

item     tags
1        awesome, great
2        cool, fun
3        boring, average
4        ok, expensive

Upvotes: 0

Views: 64

Answers (2)

Scott Boston
Scott Boston

Reputation: 153460

If I understand correctly, try:

df['new_tags'] = df['tags'].apply(lambda x: ', '.join(set(x.split(', '))))

Output:

   item                     tags         new_tags
0     1  awesome, awesome, great   awesome, great
1     2                cool, fun        cool, fun
2     3  boring, boring, average  average, boring
3     4            ok, expensive    expensive, ok

Upvotes: 0

Andy L.
Andy L.

Reputation: 25239

Use listcomp, str.split, pd.unique and join

df['unique_tags'] = [', '.join(pd.unique(x)) for x in df.tags.str.split(', ')]

Out[145]:
   item                     tags      unique_tags
0     1  awesome, awesome, great   awesome, great
1     2                cool, fun        cool, fun
2     3  boring, boring, average  boring, average
3     4            ok, expensive    ok, expensive

Upvotes: 1

Related Questions