Reputation: 157
I have the following dataframe:
id v1 v2
1 a b
1 a d
2 c e
2 d e
2 f g
I need to concatenate v1 and v2 using comma in each id group, however, if a value already exists in the group, do not concatenate it. So the output should look like:
id v1 v2
1 a b,d
2 c,d,f e,g
I have done concatenation part, but I am not sure how to skip the duplicated values. Here is my code so far:
df.groupby(['id'])[['v1', 'v2']].agg(lambda x: ', '.join(x)).reset_index()
Upvotes: 1
Views: 1275
Reputation: 150735
You can wrap x
around a set to get rid of the duplicates:
(df.groupby(['id'])[['v1', 'v2']]
.agg(lambda x: ', '.join(set(x)))
.reset_index()
)
Output:
id v1 v2
0 1 a d, b
1 2 c, d, f g, e
Upvotes: 3