Obtain the standard deviation of a grouped dataframe column

Question

I am trying to obtain the (sample) standard deviation of a column's values, grouped by another column in my dataframe.

To be concrete, I have something like this:

  col1  col2                       
0  A     10 
1  A     5
2  A     5
3  B     2
4  B     20
2  B     40

And I am trying to get here:

  col1  col2 std                      
0  A     10  2.89
1  A     5   2.89
2  A     5   2.89
3  B     2   19.00
4  B     20  19.00
2  B     40  19.00

I tried with the following code:

df['std']=df.groupby('col1')['col2'].std(skipna=True, ddof=1)

But I receive the following error:

UnsupportedFunctionCall: numpy operations are not valid with groupby. Use .groupby(...).std() instead

What am I doing wrong here?

Thanks!

jezrael · Accepted Answer

Use GroupBy.transform with lambda function:

df['std']=df.groupby('col1')['col2'].transform(lambda x: x.std(skipna=True, ddof=1))
print (df)
  col1  col2        std
0    A    10   2.886751
1    A     5   2.886751
2    A     5   2.886751
3    B     2  19.008770
4    B    20  19.008770
2    B    40  19.008770

Obtain the standard deviation of a grouped dataframe column

Answers (1)

Related Questions