Grouped Aggregation on multiple columns in Pandas

Question

I would like to perform a few aggregations on a groupby object. I would like to do this possibly on different columns and possibly with more than one aggregation per column.

Example

In [1]: from pandas import *

In [2]: df = DataFrame([[1, 'Alice',   100],
                        [2, 'Bob',    -200],
                        [3, 'Alice',   300],
                        [4, 'Dennis',  400],
                        [5, 'Bob',    -500]], 
               columns=['id', 'name', 'amount'])

In [3]: g = df.groupby('name')

In [4]: g.summarize({'num_ids': g.id.nunique(), 
                     'total_amount': g.amount.sum(),
                     'max_amount': g.amount.max()})

I understand that this is not valid syntax. I hope that it is clear what I'm trying to achieve.

What is the best way to accomplish this with Pandas?

chrisb · Accepted Answer

Right out of the docs

gb = g.agg({'id': pd.Series.nunique, 
            'amount': [np.sum, np.max]})

Then if you want to to rename columns, just assign to .columns.

gb.columns = ['num_ids', 'total_amount', 'max_amount']

Grouped Aggregation on multiple columns in Pandas

Example

Answers (1)

Related Questions