Reputation: 1779
How might I aggregate data such that new columns are generated for each column's value count?
import pandas as pd
month = pd.Series(['June', 'June', 'June', 'June'])
day = pd.Series(['1', '1', '1', '1'])
hour = pd.Series(['1', '2', '3', '4'])
cat1 = pd.Series(['M', 'M', 'M', 'F'])
cat2 = pd.Series(['F', 'F', 'M', 'M'])
dict1 = {'month': month, 'day': day, 'hour': hour, 'cat1': cat1, 'cat2': cat2}
df = pd.DataFrame(dict1)
month day hour cat1 cat2
June 1 1 M F
June 1 2 M F
June 1 3 M M
June 1 4 F M
# misses the mark
df.groupby(['month', 'day']).apply(pd.value_counts)
# expected
month day cat1_M cat1_F cat2_M cat2_F
June 1 3 1 2 2
Upvotes: 1
Views: 57
Reputation: 26676
Please get_dummies on the cats' columns. Follow that with groupby agg(sum)
pd.get_dummies(data=df, columns=['cat1', 'cat2']).groupby(['month','day']).agg('sum').reset_index()
month day cat1_F cat1_M cat2_F cat2_M
0 June 1 1 3 2 2
Upvotes: 4