Reputation: 615
I have an array like this
sp500=Quandl.get("YAHOO/INDEX_GSPC")
Open High Low Close Volume Adjusted Close returns vols
Date
1950-05-26 18.67 18.67 18.67 18.67 1330000 18.67 -0.001070 0.091246
1950-05-29 18.72 18.72 18.72 18.72 1110000 18.72 0.002678 0.078494
1950-05-31 18.78 18.78 18.78 18.78 1530000 18.78 0.003205 0.073638
1950-06-01 18.77 18.77 18.77 18.77 1580000 18.77 -0.000532 0.069189
1950-06-02 18.79 18.79 18.79 18.79 1450000 18.79 0.001066 0.059300
At any date I want to find the days since vol was lower than 5%. So for instance at 1950-05-26, I will start searching backwards until I find a vol < 5% and calculate the day difference between that day and 1950-05-26. The exact functionality is simply "Days Since" this happened!
Is there any easier way to do this?
What I had in mind was using np.where(x<0.10) and then using the index to calculate day difference. np.where can be inside pd.rolling_apply for a window of 100 indices backwards. yes 100 indices backward will be the assumption of max lookup before it finds the sweet spot.
Any better ideaz than my extremely crude one highlighted above???
Upvotes: 1
Views: 1635
Reputation: 78001
Assuming that your data-frame is sorted in the index, you may combine numpy.where
and Series.fillna
to obtain the last day which vols where below some threshold. for example, starting with
>>> df
vols
Date
2014-07-10 0.045
2014-07-11 0.057
2014-07-12 0.064
2014-07-13 0.003
2014-07-14 0.021
2014-07-15 0.052
2014-07-16 0.090
this would be
>>> df['tick'] = np.where(df.vols < .05, df.index, pd.tslib.NaT)
>>> df
vols tick
Date
2014-07-10 0.045 2014-07-10
2014-07-11 0.057 NaN
2014-07-12 0.064 NaN
2014-07-13 0.003 2014-07-13
2014-07-14 0.021 2014-07-14
2014-07-15 0.052 NaN
2014-07-16 0.090 NaN
and then forward fill, and obtain the day difference with respect to index:
>>> df['tick'].fillna(method='ffill', inplace=True)
>>> df
vols tick
Date
2014-07-10 0.045 2014-07-10
2014-07-11 0.057 2014-07-10
2014-07-12 0.064 2014-07-10
2014-07-13 0.003 2014-07-13
2014-07-14 0.021 2014-07-14
2014-07-15 0.052 2014-07-14
2014-07-16 0.090 2014-07-14
>>> df['days'] = df.index.values - df['tick']
>>> df
vols tick days
Date
2014-07-10 0.045 2014-07-10 0 days
2014-07-11 0.057 2014-07-10 1 days
2014-07-12 0.064 2014-07-10 2 days
2014-07-13 0.003 2014-07-13 0 days
2014-07-14 0.021 2014-07-14 0 days
2014-07-15 0.052 2014-07-14 1 days
2014-07-16 0.090 2014-07-14 2 days
Note that you need .values
in the last step otherwise -
would do sort of set difference operation.
Upvotes: 3