Reputation: 303
I am trying to obtain the (sample) standard deviation of a column's values, grouped by another column in my dataframe.
To be concrete, I have something like this:
col1 col2
0 A 10
1 A 5
2 A 5
3 B 2
4 B 20
2 B 40
And I am trying to get here:
col1 col2 std
0 A 10 2.89
1 A 5 2.89
2 A 5 2.89
3 B 2 19.00
4 B 20 19.00
2 B 40 19.00
I tried with the following code:
df['std']=df.groupby('col1')['col2'].std(skipna=True, ddof=1)
But I receive the following error:
UnsupportedFunctionCall: numpy operations are not valid with groupby. Use .groupby(...).std() instead
What am I doing wrong here?
Thanks!
Upvotes: 1
Views: 154
Reputation: 863651
Use GroupBy.transform
with lambda function
:
df['std']=df.groupby('col1')['col2'].transform(lambda x: x.std(skipna=True, ddof=1))
print (df)
col1 col2 std
0 A 10 2.886751
1 A 5 2.886751
2 A 5 2.886751
3 B 2 19.008770
4 B 20 19.008770
2 B 40 19.008770
Upvotes: 1