Reputation: 1987
I have the following pandas dataframe which contains some 5 minute intraday data. DeltaBetweenClose is NaN
for the first trade bar in the day when market opens(9:30 EST).
time Date symbol DeltaBetweenClose
9:35 2017-07-17 spy NaN
9:40 2017-07-17 spy -1.2
..........................................
..........................................
16:00 2018-07-17 spy 1.7
9:35 2017-07-18 spy NaN
9:40 2017-07-18 spy 0.3
..........................................
..........................................
9:35 2018-07-17 nflx NaN
I am trying to create a column CloseDelta_sd
that calculates a rolling standard deviation of DeltaBetweenClose
column grouped by symbol
s that looks into the prior 30 bars and calculates standard deviation while ignoring NaN
s. My following attempt returns all NaN
s. It works when there is only one NaN
at the top of the DeltaBetweenClose
column.
df['CloseDelta_sd'] = df.groupby('symbol').DeltaBetweenClose.transform(lambda x: x.rolling(30).std())
Upvotes: 4
Views: 2751
Reputation: 51165
The issue isn't with std
, since that skips NaN
by default, but instead with rolling
.
You need to make use of the min_periods
parameter:
Minimum number of observations in window required to have a value (otherwise result is NA). For a window that is specified by an offset, this will default to 1.
Since you provide an int
instead of an offset, you will end up with a lot of NaN
groups, the fix is simple:
(df.groupby('symbol').DeltaBetweenClose
.transform(lambda x: x.rolling(30, min_periods=1).std()))
Upvotes: 3