How to divide the sum with the size in a pandas groupby

Question

I have a dataframe like

  ID_0 ID_1  ID_2
0    a    b     1
1    a    c     1
2    a    b     0
3    d    c     0
4    a    c     0
5    a    c     1

I would like to groupby ['ID_0','ID_1'] and produce a new dataframe which has the sum of the ID_2 values for each group divided by the number of rows in each group.

grouped  = df.groupby(['ID_0', 'ID_1'])
print grouped.agg({'ID_2': np.sum}), "
", grouped.size()

gives

           ID_2
ID_0 ID_1
a    b        1
     c        2
d    c        0
ID_0  ID_1
a     b       2
      c       3
d     c       1
dtype: int64

How can I get the new dataframe with the np.sum values divided by the size() values?

Nickil Maveli · Accepted Answer

Use groupby.apply instead:

df.groupby(['ID_0', 'ID_1']).apply(lambda x: x['ID_2'].sum()/len(x))

ID_0  ID_1
a     b       0.500000
      c       0.666667
d     c       0.000000
dtype: float64

How to divide the sum with the size in a pandas groupby

Answers (2)

Related Questions