mitch
mitch

Reputation: 389

trying to aggregate multiple columns in groupy by using python similiar to SQL

I am currently looking to aggregate a dataframe by many categorical columns and sum up several metric columns as well. I am trying to do this similiar to how I would in SQL but I cant seem to find a simple method. I also am not sure if I am at the limits of pandas group by as the code below returns a keyerror on the second metric column. the code will run if I only aggregate one column. How do I aggregate multiple columns?

df_agg = pd.DataFrame(data = df.groupby(['House', 'cat1', 'cat2', 'cat3'])
['points'].mean()
['counts'].count()
['value'].sum()
['metric'].sum()
['metric2'].sum()
['metric3'].sum())  

Upvotes: 1

Views: 328

Answers (1)

jezrael
jezrael

Reputation: 863256

Use agg by dictionary of columns with aggregate functions, DataFrame contructor is not necessary:

d = {'points':'mean', 'counts':'count','value':'sum','metric':'sum','metric1':'sum','metric2':'sum'}
df_agg = df.groupby(['House', 'cat1', 'cat2', 'cat3']).agg(d).reset_index()
print (df_agg)

Upvotes: 2

Related Questions