Reputation: 4651
I have this pandas data frame:
df = DataFrame({'id':['a','b','b','b','c','c'], 'category':['z','z','x','y','y','y'], 'category2':['1','2','2','2','1','2']})
which looks like:
category category2 id
0 z 1 a
1 z 2 b
2 x 2 b
3 y 2 b
4 y 1 c
5 y 2 c
What i'd like to do is to groupby id and return the other two columns as a concatenation of unique strings.
The outcome would look like:
category category2 id
0 z 1 a
1 zxy 2 b
2 y 12 c
Upvotes: 6
Views: 26859
Reputation: 879133
Use groupby/agg
to aggregate the groups. For each group, apply set
to find the unique strings, and ''.join
to concatenate the strings:
In [34]: df.groupby('id').agg(lambda x: ''.join(set(x)))
Out[34]:
category category2
id
a z 1
b yxz 2
c y 12
To move id
from the index to a column of the resultant DataFrame, call reset_index
:
In [59]: df.groupby('id').agg(lambda x: ''.join(set(x))).reset_index()
Out[59]:
id category category2
0 a z 1
1 b yxz 2
2 c y 12
Upvotes: 21