Strange behavior when using lambda function on Pandas's groupby

Question

I have a pandas DataFrame with two groups 'A' and 'B', and one element is missing in each of the group.

df4 = pd.DataFrame({'Name' : ['A', 'A', 'A', 'A', 'B', 'B', 'B'], 
                    'X' : [0, 0.5,1, np.nan, 1,np.nan,1]})

Name    X
A       0.0
A       0.5
A       1.0
A       nan
B       1.0
B       nan
B       1.0

I would like to use a lambda function to fill in the missing data for each group

Correct behavior when using `x.mean()`

df4.groupby('Name')['X'].transform(lambda x: x.fillna(x.mean()))
0    0.0
1    0.5
2    1.0
3    0.5 <------ Filled as 0.5
4    1.0
5    1.0 <------ Filled as 1
6    1.0

If I use x.mean() as shown above, the behavior is correct, since in group A, the mean is 1.5/3 which is 0.5. The same goes for group B.

Strange behavior when using `x.std()`

However, if I use x.std() instead, the filled number doesn't make sense to me. For group A, there's only three existing elements, 0, 0.5, and 1.0, and their standard deviation should be 0.408. Yet, the lambda function gives me the following output.

df4.groupby('Name')['X'].transform(lambda x: x.fillna(x.std()))
0    0.0
1    0.5
2    1.0
3    0.5 <------ Filled as 0.5 instead of 0.4082
4    1.0
5    0.0 <------ Correct
6    1.0

Can anyone explain the behavior? Where does that 0.5 comes from?

jezrael · Accepted Answer

Need to change default parameter of pandas.Series.std ddof=1 to ddof=0:

print (df4.groupby('Name')['X'].transform(lambda x: x.fillna(x.std(ddof=0))))
0    0.000000
1    0.500000
2    1.000000
3    0.408248
4    1.000000
5    0.000000
6    1.000000
Name: X, dtype: float64

Strange behavior when using lambda function on Pandas's groupby

Correct behavior when using `x.mean()`

Strange behavior when using `x.std()`

Answers (1)

Related Questions

Strange behavior when using lambda function on Pandas&#39;s groupby

Correct behavior when using x.mean()

Strange behavior when using x.std()

Answers (1)

Related Questions

Strange behavior when using lambda function on Pandas's groupby

Correct behavior when using `x.mean()`

Strange behavior when using `x.std()`