Reputation: 21263
I have a dataframe like
ID_0 ID_1 ID_2
0 a b 1
1 a c 1
2 a b 0
3 d c 0
4 a c 0
5 a c 1
I would like to groupby ['ID_0','ID_1'] and produce a new dataframe which has the sum of the ID_2 values for each group divided by the number of rows in each group.
grouped = df.groupby(['ID_0', 'ID_1'])
print grouped.agg({'ID_2': np.sum}), "\n", grouped.size()
gives
ID_2
ID_0 ID_1
a b 1
c 2
d c 0
ID_0 ID_1
a b 2
c 3
d c 1
dtype: int64
How can I get the new dataframe with the np.sum values divided by the size() values?
Upvotes: 11
Views: 23191
Reputation: 588
Instead of using apply()
it might be quicker using a vectorised approach:
import numpy as np
df['count'] = df['ID_0'] # copy column for counting
df = df.groupby(['ID_0', 'ID_1']).agg({
'ID_2' : np.sum,
'count': np.size
}).reset_index()
df['ID_2'] /= df['count']
df = df.drop(['count'], axis=1)
Upvotes: 0
Reputation: 29711
Use groupby.apply
instead:
df.groupby(['ID_0', 'ID_1']).apply(lambda x: x['ID_2'].sum()/len(x))
ID_0 ID_1
a b 0.500000
c 0.666667
d c 0.000000
dtype: float64
Upvotes: 21