coffeequant
coffeequant

Reputation: 615

In python/pandas how to search for nearest value?

I have an array like this

sp500=Quandl.get("YAHOO/INDEX_GSPC")


    Open    High    Low     Close   Volume  Adjusted Close  returns     vols
Date                                
1950-05-26  18.67   18.67   18.67   18.67   1330000     18.67   -0.001070   0.091246
1950-05-29  18.72   18.72   18.72   18.72   1110000     18.72   0.002678    0.078494
1950-05-31  18.78   18.78   18.78   18.78   1530000     18.78   0.003205    0.073638
1950-06-01  18.77   18.77   18.77   18.77   1580000     18.77   -0.000532   0.069189
1950-06-02  18.79   18.79   18.79   18.79   1450000     18.79   0.001066    0.059300

At any date I want to find the days since vol was lower than 5%. So for instance at 1950-05-26, I will start searching backwards until I find a vol < 5% and calculate the day difference between that day and 1950-05-26. The exact functionality is simply "Days Since" this happened!

Is there any easier way to do this?

What I had in mind was using np.where(x<0.10) and then using the index to calculate day difference. np.where can be inside pd.rolling_apply for a window of 100 indices backwards. yes 100 indices backward will be the assumption of max lookup before it finds the sweet spot.

Any better ideaz than my extremely crude one highlighted above???

Upvotes: 1

Views: 1635

Answers (1)

behzad.nouri
behzad.nouri

Reputation: 78001

Assuming that your data-frame is sorted in the index, you may combine numpy.where and Series.fillna to obtain the last day which vols where below some threshold. for example, starting with

>>> df
             vols
Date             
2014-07-10  0.045
2014-07-11  0.057
2014-07-12  0.064
2014-07-13  0.003
2014-07-14  0.021
2014-07-15  0.052
2014-07-16  0.090

this would be

>>> df['tick'] = np.where(df.vols < .05, df.index, pd.tslib.NaT)
>>> df
             vols        tick
Date                         
2014-07-10  0.045  2014-07-10
2014-07-11  0.057         NaN
2014-07-12  0.064         NaN
2014-07-13  0.003  2014-07-13
2014-07-14  0.021  2014-07-14
2014-07-15  0.052         NaN
2014-07-16  0.090         NaN

and then forward fill, and obtain the day difference with respect to index:

>>> df['tick'].fillna(method='ffill', inplace=True)
>>> df
             vols        tick
Date                         
2014-07-10  0.045  2014-07-10
2014-07-11  0.057  2014-07-10
2014-07-12  0.064  2014-07-10
2014-07-13  0.003  2014-07-13
2014-07-14  0.021  2014-07-14
2014-07-15  0.052  2014-07-14
2014-07-16  0.090  2014-07-14

>>> df['days'] = df.index.values - df['tick']
>>> df
             vols        tick   days
Date                                
2014-07-10  0.045  2014-07-10 0 days
2014-07-11  0.057  2014-07-10 1 days
2014-07-12  0.064  2014-07-10 2 days
2014-07-13  0.003  2014-07-13 0 days
2014-07-14  0.021  2014-07-14 0 days
2014-07-15  0.052  2014-07-14 1 days
2014-07-16  0.090  2014-07-14 2 days

Note that you need .values in the last step otherwise - would do sort of set difference operation.

Upvotes: 3

Related Questions