Reputation: 23
In Pandas, as far as I am aware, the rolling_* methods do not contain a way of specifying a range (in this case a time range) as a window/bucket.
I have seen a similar question here:Pandas: rolling mean by time interval I understand I could resample the data, but this is not ideal with large datasets, particularly if the window size is relatively small. Similar issues exist with the solution here:pandas rolling computation with window based on values instead of counts and Compute EWMA over sparse/irregular TimeSeries in Pandas
Imagine if I wanted to calculate a Volume Weighted Average Price (VWAP) on a months worth of tick data with a small time horizon on the VWAP. Resampling the data result in quiet market periods being populated with rows upon rows of zero values, expanding the dataset to oblivion.
A small sample dataset (with code) is provided below.
from StringIO import StringIO
from datetime import date, datetime, time
from pytz import timezone
import pandas as pd
s = """TIMESTAMP_DT,PRICE,QTY
2015-09-08 10:24:16.671862751+10:00,97.295,2
2015-09-08 10:25:33.952672310+10:00,97.3,4
2015-09-08 10:38:30.840283893+10:00,97.3,3
2015-09-08 11:00:47.536800660+10:00,97.305,1
2015-09-08 11:00:47.536896273+10:00,97.305,2
"""
SYD = timezone('Australia/Sydney')
df1 = pd.read_csv(StringIO(s), sep=',', index_col = 0)
df1.index = pd.to_datetime(df1.index)
df1.index = df1.index.tz_localize('UTC').tz_convert(SYD)
PRICE QTY
TIMESTAMP_DT
2015-09-08 10:24:16.671862751+10:00 97.295 2
2015-09-08 10:25:33.952672310+10:00 97.300 4
2015-09-08 10:38:30.840283893+10:00 97.300 3
2015-09-08 11:00:47.536800660+10:00 97.305 1
2015-09-08 11:00:47.536896273+10:00 97.305 2
I could easily get a volume scaled price by doing something like
df1['Volume_Scaled_Price'] = df1['PRICE'] * df1['QTY']
Using some of the pandas rolling methods, if I were able to specify a rolling TIME window (possibly as a time delta), would look something like:
df1['VWAP'] =
rolling_sum(df1['Volume_Scaled_Price'], window = timedelta(minute = 5), min_periods = 1)
/ rolling_sum(df1['QTY'], window = timedelta(minute = 5), min_periods = 1)
Does anybody know an efficient way of achieving rolling windows, specifying time buckets?
Upvotes: 2
Views: 3343
Reputation: 363
Not sure if you ended up figuring out a solution, but I recently asked a similar question. It was pointed out that pandas 0.19.0 now has support for Time-aware Rolling.
I think that you should be able to perform your rolling calculation on 5 min windows with the below:
df1['VWAP'] = df1['Volume_Scaled_Price'].rolling('5min').sum() / df1['QTY'].rolling('5min').sum()
Also - here is a list of the offset aliases that are currently supported.
http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases
Upvotes: 3