user3139545
user3139545

Reputation: 7374

Need help understanding and fixing pandas volatility implementaion

In the book Advances in Financial Machine Learning the code below is shown with the description:

getDailyVol computes the daily volatility at intraday estimation points, applying a span of span0 days to an exponentially weighted moving standard deviation.

def getDailyVol(close,span0=100):
    # daily vol, reindexed to close
    df0 = close.index.searchsorted(close.index-pd.Timedelta(days=1))
    df0 = df0[df0>0]
    line 5: df0 = pd.Series(close.index[df0-1], index=close.index[close.shape[0] - df0.shape[0]])
    df0 = close.loc[df0.index]/close.loc[df0.values].values-1 # daily returns
    df0 = df0.ewm(span=span0).std()
    return df0

However when running this code and passing on a Series with stock closing prices im getting the following error in line 5:

TypeError: Index(...) must be called with a collection of some kind, Timestamp('2014-03-04 09:00:14.213000') was passed

Now my questions are:

  1. Why am I getting this error?
  2. Can you break down the code and explain row by row what happens and why? Specifincally what I dont understand is the need for searchsorted and the index on line 5.

Upvotes: 2

Views: 392

Answers (1)

tnf
tnf

Reputation: 303

  1. Have you checked date to make sure you are using datetime and not strings? The code is expecting dt in the form 2018-07-02 08:30:01.

  2. Is a little complicated. The book emphasizes dollar bars which are asynchronous. Yet, calculates returns using a constant number of days. Hence, you never know exactly how many records to look back to calc a return. This first line in the function returns row indexes but adjusts them for "how many records do I need to look back to get the first one AFTER n days has passed. As mentioned, this is not a constant.

Assume you have bars on Monday at 9:34:00 and 9:35:30. On Tuesday you have bars at 9:33:50, 9:34:10, 9:34:20, and 9:35:40. What happens? Monday 9:34:00 is paired with Tuesday 9:34:10 (first bar after 24 hours has passed). Tuesday 9:34:20 is unused. Monday 9:35:30 is paired with Tuesday 9:35:40. This line of code handles the row indexes needed to make this happen. It is a clever method for mixing constant time deltas within asynchronous data.

Sorry to be long winded. The code has been well tested and is sound. Check your datetime and you should be good to go.

Upvotes: 2

Related Questions