Reputation: 7358
I have Yahoo Stock data, that I would like to manipulate, like so,
import pandas as pd
import pandas.io.data as web
data = web.DataReader('SPY','yahoo')
data.head()
Out[13]:
Open High Low Close Volume Adj Close
Date
2010-01-04 112.37 113.39 111.51 113.33 118944600 103.44
2010-01-05 113.26 113.68 112.85 113.63 111579900 103.71
2010-01-06 113.52 113.99 113.43 113.71 116074400 103.79
2010-01-07 113.50 114.33 113.18 114.19 131091100 104.23
2010-01-08 113.89 114.62 113.66 114.57 126402800 104.57
For any given date, I would like to look forward 2 days and find the lowest quote for it. So, for 2010-1-4, the correct answer would be 112.85.
Now, I could iterate over all the dates with a for loop and get what I want. But I would like to figure out if I could do this in a vectorized manner. Maybe by using a rolling_apply lambda function. This is what I have done so far...
def foo(x):
today = x[0]
forward = x[1:]
return (forward.min())
pd.rolling_apply(data,2,foo)
This does not work since the rolling_apply works on a Series and does not have access to the other columns on the data frame.
Is this some neat way to this?
Upvotes: 1
Views: 153
Reputation: 394439
Rather than calling rolling_apply
on the whole dataframe, just call it on the column of interest and call min
:
pd.rolling_apply(data['Low'],2,min)
Interestingly the global min
function outperforms the numpy min
, perhaps not that surprising given that all we are doing is finding the lowest value of a 2 element array:
In [26]:
%timeit pd.rolling_apply(data['Low'],2,np.min)
%timeit pd.rolling_apply(data['Low'],2,min)
10 loops, best of 3: 15.4 ms per loop
1000 loops, best of 3: 1.44 ms per loop
Upvotes: 2