Reputation: 911
Pandas beginner here. I would like to calculate the standard deviation of SPEED
for ships that depart on the same date.
I've tried the following and it returns NaN
- why is that? The correct answer should be 9.9500418759588
import pandas as pd
df = pd.DataFrame({ 'LEFT PORT DATE':['21/10/2019','21/10/2019','21/10/2019','20/10/2019'], 'SPEED':[10, 20, 0.10, 50]})
df['RUN STD DEVIATION'] = df.groupby('LEFT PORT DATE')['SPEED'].std()
Upvotes: 1
Views: 349
Reputation: 30940
This happens because the index after grouping by 'LEFT PORT DATE'
is precisely this column and then when assigning to the original dataframe that has different index returns NaN, you can use transform
data['RUN STD DEVIATION']=data.groupby('LEFT PORT DATE')['SPEED'].transform('std')
print(data)
LEFT PORT DATE SPEED RUN STD DEVIATION
0 21/10/2019 10.0 9.950042
1 21/10/2019 20.0 9.950042
2 21/10/2019 0.1 9.950042
3 20/10/2019 50.0 NaN
To further elaborate:
data.groupby('LEFT PORT DATE')['SPEED'].std()
returns a series with index LEFT PORT DATE
, which is not the same as the index of the actual dataframe. Pandas assignment is index based
when it comes to series assignment.
LEFT PORT DATE
20/10/2019 NaN
21/10/2019 9.950042
Name: SPEED, dtype: float64
Upvotes: 1