Python percentile of recent value vs window of previous values

Question

Apologies I am a noob looking to transition from R!

Reproducible data example;

df = pd.DataFrame(1.26 + np.random.rand(size)/100.0,
                  index=pd.date_range('20160101 09:00:00',
                                      periods=size,
                                      freq='60s'),
                  columns=['ATR20'])

I would like to take a value in the column ATR20 and compute its current percentile against rolling window of the previous n values of column ATR20.

Heres as far as I got:

for n in range(1,len(df)):
print(sum(df.ATR20[n:n+20] > df.ATR20))

Which gives the following error:

ValueError: Can only compare identically-labeled Series objects

I am sure that there is an easier way to do this with neater syntax but my searches haven't found anything close enough.

Thanks.

Engineero · Accepted Answer

I think this will get what you're looking for. I used your example with only 10 elements in the dataframe and a rolling window of 3 elements, just so it is shorter when pasted here.

size = 10
df = pd.DataFrame(1.26 + np.random.rand(size)/100.0,
                  index=pd.date_range('20160101 09:00:00',
                                      periods=size,
                                      freq='60s'),
                  columns=['ATR20'])

#                         ATR20
# 2016-01-01 09:00:00  1.262522
# 2016-01-01 09:01:00  1.265116
# 2016-01-01 09:02:00  1.265051
# 2016-01-01 09:03:00  1.261109
# 2016-01-01 09:04:00  1.262699
# 2016-01-01 09:05:00  1.266710
# 2016-01-01 09:06:00  1.260186
# 2016-01-01 09:07:00  1.268001
# 2016-01-01 09:08:00  1.263227
# 2016-01-01 09:09:00  1.261331

You can use df.rolling along with apply to apply a function to a rolling window. I think the calculation I used here is more or less what you are looking for:

window = 3
df.rolling(window).apply(lambda x: sum([x[-1] > i for i in x]) / (window - 1))

#                         ATR20
# 2016-01-01 09:00:00       NaN
# 2016-01-01 09:01:00       NaN
# 2016-01-01 09:02:00  0.263665
# 2016-01-01 09:03:00  0.000000
# 2016-01-01 09:04:00  0.263932
# 2016-01-01 09:05:00  0.527632
# 2016-01-01 09:06:00  0.000000
# 2016-01-01 09:07:00  0.527024
# 2016-01-01 09:08:00  0.263754
# 2016-01-01 09:09:00  0.000000

Python percentile of recent value vs window of previous values

Answers (1)

Related Questions