Daniela
Daniela

Reputation: 911

Standard deviation on group returns null

Pandas beginner here. I would like to calculate the standard deviation of SPEED for ships that depart on the same date.

I've tried the following and it returns NaN - why is that? The correct answer should be 9.9500418759588

import pandas as pd
df = pd.DataFrame({ 'LEFT PORT DATE':['21/10/2019','21/10/2019','21/10/2019','20/10/2019'], 'SPEED':[10, 20, 0.10, 50]})

df['RUN STD DEVIATION']  =  df.groupby('LEFT PORT DATE')['SPEED'].std()

Upvotes: 1

Views: 349

Answers (1)

ansev
ansev

Reputation: 30940

This happens because the index after grouping by 'LEFT PORT DATE' is precisely this column and then when assigning to the original dataframe that has different index returns NaN, you can use transform

data['RUN STD DEVIATION']=data.groupby('LEFT PORT DATE')['SPEED'].transform('std')
print(data)
  LEFT PORT DATE  SPEED  RUN STD DEVIATION
0     21/10/2019   10.0           9.950042
1     21/10/2019   20.0           9.950042
2     21/10/2019    0.1           9.950042
3     20/10/2019   50.0                NaN

To further elaborate:

data.groupby('LEFT PORT DATE')['SPEED'].std() returns a series with index LEFT PORT DATE, which is not the same as the index of the actual dataframe. Pandas assignment is index based when it comes to series assignment.

LEFT PORT DATE
20/10/2019         NaN
21/10/2019    9.950042
Name: SPEED, dtype: float64

Upvotes: 1

Related Questions