Reputation: 3
I am trying to follow the equations on this paper here , to calculate the historical volatility for power time series data.
The statistical description of the data as follows :
count 9855.000000
mean 291.135088
std 187.503344
min 0.000000
25% 112.408512
50% 332.370871
75% 449.527323
max 601.370058
Here is my implementation:
# Computing Volatility
window_size=27
VOLATILITY = pd.DataFrame()
VOLATILITY['PV']= data
#'intra-hour
# Compute the logarithmic value
VOLATILITY['Log_IA'] = np.log(VOLATILITY['PV'] / VOLATILITY['PV'].shift(1))
# Compute Volatility using the pandas rolling standard deviation function
VOLATILITY['intra-hour'] = VOLATILITY['Log_IA'].rolling(window=window_size).std() * np.sqrt(window_size)# *100
My data have 1 year of half-hourly observations for 27 steps for each day
Here is an example of the output I got for 1 selected day where the NaN problem exists
Timestamp PV Log_IA intra-hour
2019-03-01 06:00:00 0.000000 NaN NaN
2019-03-01 06:30:00 2.946333 inf NaN
2019-03-01 07:00:00 20.963667 1.962229 NaN
2019-03-01 07:30:00 38.284333 0.602250 NaN
2019-03-01 08:00:00 38.224667 -0.001560 NaN
2019-03-01 08:30:00 54.486667 0.354475 NaN
2019-03-01 09:00:00 54.608333 0.002230 NaN
2019-03-01 09:30:00 55.290667 0.012418 NaN
2019-03-01 10:00:00 54.317333 -0.017761 NaN
2019-03-01 10:30:00 54.680333 0.006661 NaN
2019-03-01 11:00:00 42.142333 -0.260451 NaN
2019-03-01 11:30:00 44.569000 0.055986 NaN
2019-03-01 12:00:00 36.988333 -0.186436 NaN
2019-03-01 12:30:00 35.802000 -0.032599 NaN
2019-03-01 13:00:00 29.006667 -0.210478 NaN
2019-03-01 13:30:00 43.254333 0.399572 NaN
2019-03-01 14:00:00 45.246333 0.045024 NaN
2019-03-01 14:30:00 29.768333 -0.418676 NaN
2019-03-01 15:00:00 37.510667 0.231180 NaN
2019-03-01 15:30:00 31.937000 -0.160860 NaN
2019-03-01 16:00:00 39.990333 0.224873 NaN
2019-03-01 16:30:00 32.263000 -0.214717 NaN
2019-03-01 17:00:00 40.707333 0.232487 NaN
2019-03-01 17:30:00 14.551333 -1.028726 NaN
2019-03-01 18:00:00 10.294333 -0.346089 NaN
2019-03-01 18:30:00 2.552667 -1.394455 NaN
2019-03-01 19:00:00 0.036333 -4.252158 NaN
So, why I am getting the NaN as a result for volatility ?
is it a problem with my implementation?
Upvotes: 0
Views: 409
Reputation: 3504
Your implementation is such that you will always get NaN
for the zeroth value of Log_IA
regardless of the values of PV
. This is a result of the shift and can be confirmed by running the following snippet:
import numpy as np
import pandas as pd
VOLATILITY = pd.DataFrame()
VOLATILITY['PV'] = (1.0, 2.0, 3.0)
VOLATILITY['Log_IA'] = np.log(VOLATILITY['PV'] / VOLATILITY['PV'].shift(1))
You will see that VOLATILITY
is:
PV Log_IA
0 1.0 NaN
1 2.0 0.693147
2 3.0 0.405465
The zeroth value of Log_IA
is NaN
because you are dividing by the previous value of
PV
, that is the value of PV
at index -1. This is what shift()
does, but there is no value at index -1 so you get a NaN
. You may set a value to use in place of any non-existent values with the fill_value
argument: shift(1, fill_value=123)
. You will also get inf
for any value of Log_IA
when the previous index's PV
value is 0.
So why do you get NaN
s for all values of intra-hour
? You take the rolling standard deviation of VOLATILITY['Log_IA'].rolling(window=window_size)
which contains inf
at index 0, and standard deviation is undefined for sequences containing inf
. So the rolling standard deviation is NaN
, and any arithmetic operation with NaN
results in NaN
.
Now, you also have an issue with in your code. Logarithm is undefined for values less than or equal to 0, but this is not what is causing your
NaN
s.
https://www.varsitytutors.com/hotmath/hotmath_help/topics/logarithmic-functions.html
Upvotes: 1