Simd
Simd

Reputation: 21263

How to divide the sum with the size in a pandas groupby

I have a dataframe like

  ID_0 ID_1  ID_2
0    a    b     1
1    a    c     1
2    a    b     0
3    d    c     0
4    a    c     0
5    a    c     1

I would like to groupby ['ID_0','ID_1'] and produce a new dataframe which has the sum of the ID_2 values for each group divided by the number of rows in each group.

grouped  = df.groupby(['ID_0', 'ID_1'])
print grouped.agg({'ID_2': np.sum}), "\n", grouped.size()

gives

           ID_2
ID_0 ID_1
a    b        1
     c        2
d    c        0
ID_0  ID_1
a     b       2
      c       3
d     c       1
dtype: int64

How can I get the new dataframe with the np.sum values divided by the size() values?

Upvotes: 11

Views: 23191

Answers (2)

pdaawr
pdaawr

Reputation: 588

Instead of using apply() it might be quicker using a vectorised approach:

import numpy as np

df['count'] = df['ID_0']  # copy column for counting

df = df.groupby(['ID_0', 'ID_1']).agg({
    'ID_2' : np.sum,
    'count': np.size
}).reset_index()

df['ID_2'] /= df['count']
df = df.drop(['count'], axis=1)

Upvotes: 0

Nickil Maveli
Nickil Maveli

Reputation: 29711

Use groupby.apply instead:

df.groupby(['ID_0', 'ID_1']).apply(lambda x: x['ID_2'].sum()/len(x))

ID_0  ID_1
a     b       0.500000
      c       0.666667
d     c       0.000000
dtype: float64

Upvotes: 21

Related Questions