Max Candocia
Max Candocia

Reputation: 4386

How do I aggregate multiple columns with one function in pandas when using groupby?

I have a data frame with a "group" variable, a "count" variable, and a "total" variable. For each group I want to sum the count column and divide that by the sum of the total column. How do I do this, ideally in one line of code?

Here is an example to work with:

test_dc = {1:{'group':'A','cnt':3,'total':5},
           2:{'group':'B','cnt':1,'total':8},
           3:{'group':'A','cnt':2,'total':4},
           4:{'group':'B','cnt':6,'total':13}
          }
test_df = pd.DataFrame.from_dict(test_dc, orient='index')

Expected output (roughly):

group | average
    A | 0.55555
    B | 0.33333

Edit: changed column name from "count" to "cnt" because there seems to be an existing count() method on groupby objects.

Upvotes: 1

Views: 3486

Answers (2)

piRSquared
piRSquared

Reputation: 294218

I'd use a combination of agg and eval

test_df.groupby('group').agg('sum').eval('cnt / total')

group
A    0.555556
B    0.333333
dtype: float64

Upvotes: 2

Aleksey Bilogur
Aleksey Bilogur

Reputation: 3846

You can use DataFrame.groupby to group by a column, and then call sum on that to get the sums.

>>> df = test_df
    .groupby('group')
    .sum()
>>> df      
       count  total
group              
A          5      9
B          7     21

Then you can grab the column and divide them through to get your answer.

>>> df['count'] / df['total']
    group
A    0.555556
B    0.333333
dtype: float64

You can do this in one line by taking advantage of the DataFrame.pipe operator:

test_df
    .groupby('group')
    .sum()
    .pipe(lambda df: df['count'] / df['total'])

Upvotes: 4

Related Questions