Reputation: 15
I would like to summarize by dataset based on two columns and get both the 'sum' and 'count' of these columns. I know how to get the 'sum' and 'count' of the columns separately, for example:
summary = df.groupby(['country','item', "date", 'msgtype'], as_index=False)['size', 'turnover'].sum()
But I want to be able to do it in one line if possible, without having to do merges etc.
I tried to do this:
summary = df.groupby(['country','item', "date", 'order'], as_index=False).agg({(['size', 'weight']) : ['sum' , 'count']})
But I get the error:
unhashable type: 'list'
So I'm not too sure what to do. If anyone could help me with this problem I would really appreciate it, thanks!
Upvotes: 0
Views: 482
Reputation: 23021
Try this.
# aggregate size and weight by country, item, date and order
summary = df.groupby(['country','item', "date", 'order'])[['size', 'weight']].agg(['sum' , 'count'])
# flatten column names
summary.columns = [f"{x}_{y}" for x,y in summary.columns]
# make groupers columns
summary = summary.reset_index()
It groups by country, item, date and order and aggregates size and weight columns by their respective sum and counts.
Upvotes: 1