Reputation: 7593
I'm trying to do a groupby where in one column I have string categorical data:
ID cat_1 cat_2
11 'OG' 'ASD'
11 'LOL' 'ASD'
11 'OG' 'DFG'
22 'LOL' 'DFG'
22 'OG' 'DFG'
And I'm trying to group by the ID, and aggregate the string data into a numeric feature, namely, the counts of occurrences for each category. So the outcome would be:
ID OG LOL ASD DFG
11 2 1 2 1
22 1 1 0 2
How can I achieve this in pandas? Thank you!
Upvotes: 0
Views: 505
Reputation: 23166
You could use pd.get_dummies
with groupby
and stack
:
>>> pd.get_dummies(df.set_index("ID").stack()).groupby("ID").sum()
ASD DFG LOL OG
ID
11 2 1 1 2
22 0 2 1 1
Upvotes: 1
Reputation: 262634
You can stack/value_counts/unstack:
(df.set_index('ID')
.stack()
.groupby('ID')
.value_counts()
.unstack(fill_value=0)
)
NB. you can add .reset_index()
if you want all columns
output:
ASD DFG LOL OG
ID
11 2 1 1 2
22 0 2 1 1
Upvotes: 2