Reputation: 389
I am currently looking to aggregate a dataframe by many categorical columns and sum up several metric columns as well. I am trying to do this similiar to how I would in SQL but I cant seem to find a simple method. I also am not sure if I am at the limits of pandas group by as the code below returns a keyerror on the second metric column. the code will run if I only aggregate one column. How do I aggregate multiple columns?
df_agg = pd.DataFrame(data = df.groupby(['House', 'cat1', 'cat2', 'cat3'])
['points'].mean()
['counts'].count()
['value'].sum()
['metric'].sum()
['metric2'].sum()
['metric3'].sum())
Upvotes: 1
Views: 328
Reputation: 863256
Use agg
by dictionary
of columns with aggregate functions, DataFrame
contructor is not necessary:
d = {'points':'mean', 'counts':'count','value':'sum','metric':'sum','metric1':'sum','metric2':'sum'}
df_agg = df.groupby(['House', 'cat1', 'cat2', 'cat3']).agg(d).reset_index()
print (df_agg)
Upvotes: 2