Larry
Larry

Reputation: 157

Concatenate strings without duplicates

I have the following dataframe:

id  v1  v2
1   a   b
1   a   d
2   c   e
2   d   e
2   f   g

I need to concatenate v1 and v2 using comma in each id group, however, if a value already exists in the group, do not concatenate it. So the output should look like:

id  v1  v2
1   a   b,d
2   c,d,f   e,g

I have done concatenation part, but I am not sure how to skip the duplicated values. Here is my code so far:

df.groupby(['id'])[['v1', 'v2']].agg(lambda x: ', '.join(x)).reset_index()

Upvotes: 1

Views: 1275

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150735

You can wrap x around a set to get rid of the duplicates:

(df.groupby(['id'])[['v1', 'v2']]
   .agg(lambda x: ', '.join(set(x)))
   .reset_index()
)

Output:

   id       v1    v2
0   1        a  d, b
1   2  c, d, f  g, e

Upvotes: 3

Related Questions