Reputation: 504
I am trying to find the min
, max
, mean
, sum
and std
of some columns of GroupBy object in Pandas. To do that my original code was this:
df_agg = df.groupby('id')[column_list].agg(['mean', 'max', 'min', 'sum', 'std'])
But this was producing a lot of NaN
s in the std
columns. Upon searching, I found that std
accepts an argument - ddof
(Delta Degrees of Freedom) - which is set to 1 by default. This was causing a division by 0 error and giving those NaN
values.
Now, I want to send an argument ddof=0
in the std
that is used in the above code but I don't understand how I can do that.
Please help.
Upvotes: 3
Views: 2751
Reputation: 862701
You can create custom lambda function:
f = lambda x: x.std(ddof=0)
f.__name__ = 'std_0'
df_agg = df.groupby('id')[column_list].agg(['mean', 'max', 'min', 'sum', f])
Sample:
df = pd.DataFrame({'A':list('abcdef'),
'B':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4],
'id':list('aaabbb')})
print (df)
A B C D E id
0 a 4 7 1 5 a
1 b 5 8 3 3 a
2 c 4 9 5 6 a
3 d 5 4 7 9 b
4 e 5 2 1 2 b
5 f 4 3 0 4 b
column_list = ['C','D','E']
f = lambda x: x.std(ddof=0)
f.__name__ = 'std_0'
df_agg = df.groupby('id')[column_list].agg(['mean', 'max', 'min', 'sum', f])
print (df_agg)
C D E \
mean max min sum std_0 mean max min sum std_0 mean max
id
a 8 9 7 24 0.816497 3.000000 5 1 9 1.632993 4.666667 6
b 3 4 2 9 0.816497 2.666667 7 0 8 3.091206 5.000000 9
min sum std_0
id
a 3 14 1.247219
b 2 15 2.943920
Upvotes: 7