window (bucketing) by time for rolling_* in Pandas

Question

In Pandas, as far as I am aware, the rolling_* methods do not contain a way of specifying a range (in this case a time range) as a window/bucket.

I have seen a similar question here:Pandas: rolling mean by time interval I understand I could resample the data, but this is not ideal with large datasets, particularly if the window size is relatively small. Similar issues exist with the solution here:pandas rolling computation with window based on values instead of counts and Compute EWMA over sparse/irregular TimeSeries in Pandas

Imagine if I wanted to calculate a Volume Weighted Average Price (VWAP) on a months worth of tick data with a small time horizon on the VWAP. Resampling the data result in quiet market periods being populated with rows upon rows of zero values, expanding the dataset to oblivion.

A small sample dataset (with code) is provided below.

from StringIO import StringIO
from datetime import date, datetime, time
from pytz import timezone
import pandas as pd

s = """TIMESTAMP_DT,PRICE,QTY
2015-09-08 10:24:16.671862751+10:00,97.295,2
2015-09-08 10:25:33.952672310+10:00,97.3,4
2015-09-08 10:38:30.840283893+10:00,97.3,3
2015-09-08 11:00:47.536800660+10:00,97.305,1
2015-09-08 11:00:47.536896273+10:00,97.305,2
"""
SYD = timezone('Australia/Sydney')

df1 = pd.read_csv(StringIO(s), sep=',', index_col = 0)
df1.index =  pd.to_datetime(df1.index)
df1.index = df1.index.tz_localize('UTC').tz_convert(SYD)


                                      PRICE  QTY
TIMESTAMP_DT                                    
2015-09-08 10:24:16.671862751+10:00  97.295    2
2015-09-08 10:25:33.952672310+10:00  97.300    4
2015-09-08 10:38:30.840283893+10:00  97.300    3
2015-09-08 11:00:47.536800660+10:00  97.305    1
2015-09-08 11:00:47.536896273+10:00  97.305    2

I could easily get a volume scaled price by doing something like

df1['Volume_Scaled_Price'] = df1['PRICE'] * df1['QTY']

Using some of the pandas rolling methods, if I were able to specify a rolling TIME window (possibly as a time delta), would look something like:

df1['VWAP'] = 
    rolling_sum(df1['Volume_Scaled_Price'], window = timedelta(minute = 5), min_periods = 1)
    / rolling_sum(df1['QTY'], window = timedelta(minute = 5), min_periods = 1)

Does anybody know an efficient way of achieving rolling windows, specifying time buckets?

rs311 · Accepted Answer

Not sure if you ended up figuring out a solution, but I recently asked a similar question. It was pointed out that pandas 0.19.0 now has support for Time-aware Rolling.

I think that you should be able to perform your rolling calculation on 5 min windows with the below:

df1['VWAP'] = df1['Volume_Scaled_Price'].rolling('5min').sum() / df1['QTY'].rolling('5min').sum()

Also - here is a list of the offset aliases that are currently supported.

http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases

window (bucketing) by time for rolling_* in Pandas

Answers (1)

Related Questions