John Stud
John Stud

Reputation: 1779

Pandas: How to aggregate multiple columns into value counts, generating new columns for each value count?

How might I aggregate data such that new columns are generated for each column's value count?

import pandas as pd
month = pd.Series(['June', 'June', 'June', 'June'])
day = pd.Series(['1', '1', '1', '1'])
hour = pd.Series(['1', '2', '3', '4'])
cat1 = pd.Series(['M', 'M', 'M', 'F'])
cat2 = pd.Series(['F', 'F', 'M', 'M'])


dict1 = {'month': month, 'day': day, 'hour': hour, 'cat1': cat1, 'cat2': cat2}
df = pd.DataFrame(dict1)

month  day hour cat1 cat2
June    1   1   M   F
June    1   2   M   F
June    1   3   M   M
June    1   4   F   M

# misses the mark
df.groupby(['month', 'day']).apply(pd.value_counts)
# expected
month  day cat1_M   cat1_F  cat2_M  cat2_F
June    1    3       1        2        2

Upvotes: 1

Views: 57

Answers (1)

wwnde
wwnde

Reputation: 26676

Please get_dummies on the cats' columns. Follow that with groupby agg(sum)

 pd.get_dummies(data=df, columns=['cat1', 'cat2']).groupby(['month','day']).agg('sum').reset_index()



   month day  cat1_F  cat1_M  cat2_F  cat2_M
0  June   1       1       3       2       2

Upvotes: 4

Related Questions