Reputation: 1956
This seems like a simple question (and answer) but I'm having trouble with it.
The issue:
I have a pandas dataframe full of OHLC data. I want to find the rolling 52 week high throughout the dataframe.
My dataset is from yahoo. You can pull the same data down with the folllowing code to get daily data:
import pandas.io.data as web
df = web.DataReader('SPX', 'yahoo', start, end)
A tail of the data gives the output below:
Open High Low Close Volume
Date
2016-07-15 216.779999 217.009995 215.309998 215.830002 107155400
2016-07-18 215.970001 216.600006 215.669998 216.410004 58725900
2016-07-19 215.919998 216.229996 215.630005 216.190002 54345700
2016-07-20 216.190002 217.369995 216.190002 217.089996 58159500
2016-07-21 216.960007 217.220001 215.750000 216.270004 66070000
To get the 52 week high (rolling), I can run the following:
df["52weekhigh"] = pd.rolling_max(df.High, window=200, min_periods=1)
I get the following (some col:
High 52weekhigh
Date
2016-07-15 217.009995 217.009995
2016-07-18 216.600006 217.009995
2016-07-19 216.229996 217.009995
2016-07-20 217.369995 217.369995
2016-07-21 217.220001 217.369995
This gives me an values for the 52 week highs as new highs come in, but I'm not a fan of using 200 here. Should it be 200 or 201 or 220 (there are "approximately" 200 trading days in the year)?
I could resample the data to weekly to get the values, but then i can't easily get back to my original daily data (or can I?).
So...here's the question:
Is there a way to run rolling_max on pandas dataframes and set the window to '52 weeks' or something similar? If not, can anyone think of a better approach to this than the above?
Upvotes: 2
Views: 8746
Reputation: 879471
If your data has business-day frequency then there should be roughly 5 rows per week.
So 52 weeks would roughly correspond to window=52*5
.
Of course, there might be a few other days missing due to holidays. To be more
accurate, you could use asfreq('D')
to change the frequency from business days
to actual days. Then you could use a rolling window of size 52*7
:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
np.random.seed(2016)
N = 1000
index = pd.date_range('2000-1-1', periods=N, freq='B')
data = (np.random.random((N, 1))-0.5).cumsum(axis=0)
df = pd.DataFrame(data, index=index, columns=['price'])
# result = pd.rolling_max(df.asfreq('D'), window=52*7, min_periods=1) # for older versions of Pandas
result = df.asfreq('D').rolling(window=52*7, min_periods=1).max()
result = result.rename(columns={'price':'rolling max'})
ax = df.plot()
result.plot(ax=ax)
plt.show()
Upvotes: 9