lte__
lte__

Reputation: 7593

Pandas groupby - count unique into separate columns per group

I'm trying to do a groupby where in one column I have string categorical data:

ID   cat_1   cat_2
11   'OG'    'ASD'
11   'LOL'   'ASD' 
11   'OG'    'DFG' 
22   'LOL'   'DFG'
22   'OG'    'DFG'

And I'm trying to group by the ID, and aggregate the string data into a numeric feature, namely, the counts of occurrences for each category. So the outcome would be:

ID  OG  LOL  ASD  DFG
11   2    1    2    1
22   1    1    0    2

How can I achieve this in pandas? Thank you!

Upvotes: 0

Views: 505

Answers (2)

not_speshal
not_speshal

Reputation: 23166

You could use pd.get_dummies with groupby and stack:

>>> pd.get_dummies(df.set_index("ID").stack()).groupby("ID").sum()
    ASD  DFG  LOL  OG
ID                   
11    2    1    1   2
22    0    2    1   1

Upvotes: 1

mozway
mozway

Reputation: 262634

You can stack/value_counts/unstack:

(df.set_index('ID')
   .stack()
   .groupby('ID')
   .value_counts()
   .unstack(fill_value=0)
)

NB. you can add .reset_index() if you want all columns

output:

    ASD  DFG  LOL  OG
ID                   
11    2    1    1   2
22    0    2    1   1

Upvotes: 2

Related Questions