Nityesh Agarwal
Nityesh Agarwal

Reputation: 504

Passing arguments to a list of functions in Pandas GroupBy `agg()`

I am trying to find the min, max, mean, sum and std of some columns of GroupBy object in Pandas. To do that my original code was this:

df_agg = df.groupby('id')[column_list].agg(['mean', 'max', 'min', 'sum', 'std'])

But this was producing a lot of NaNs in the std columns. Upon searching, I found that std accepts an argument - ddof(Delta Degrees of Freedom) - which is set to 1 by default. This was causing a division by 0 error and giving those NaN values.

Now, I want to send an argument ddof=0 in the std that is used in the above code but I don't understand how I can do that.

Please help.

Upvotes: 3

Views: 2751

Answers (1)

jezrael
jezrael

Reputation: 862701

You can create custom lambda function:

f = lambda x: x.std(ddof=0)
f.__name__ = 'std_0'
df_agg = df.groupby('id')[column_list].agg(['mean', 'max', 'min', 'sum', f])

Sample:

df = pd.DataFrame({'A':list('abcdef'),
                   'B':[4,5,4,5,5,4],
                   'C':[7,8,9,4,2,3],
                   'D':[1,3,5,7,1,0],
                   'E':[5,3,6,9,2,4],
                   'id':list('aaabbb')})

print (df)
   A  B  C  D  E id
0  a  4  7  1  5  a
1  b  5  8  3  3  a
2  c  4  9  5  6  a
3  d  5  4  7  9  b
4  e  5  2  1  2  b
5  f  4  3  0  4  b

column_list = ['C','D','E']

f = lambda x: x.std(ddof=0)
f.__name__ = 'std_0'
df_agg = df.groupby('id')[column_list].agg(['mean', 'max', 'min', 'sum', f])
print (df_agg)
      C                               D                               E      \
   mean max min sum     std_0      mean max min sum     std_0      mean max   
id                                                                            
a     8   9   7  24  0.816497  3.000000   5   1   9  1.632993  4.666667   6   
b     3   4   2   9  0.816497  2.666667   7   0   8  3.091206  5.000000   9   


   min sum     std_0  
id                    
a    3  14  1.247219  
b    2  15  2.943920  

Upvotes: 7

Related Questions