Calculate
Calculate

Reputation: 343

Rolling Percentile - Pandas

This gives Percentile Values for a column below,

df[column].rank(pct=True)

But is there a way to get the rolling percentile Values for a column by using this similar method, without writing functions or loops and longer code? Currently when adding the .rolling(window), it is not working.

If there is not, Pandas desperately needs to add this as a feature for the .rolling(), it is there for .quantile(), it makes no sense to not have it for the percentiles as well.

Upvotes: 3

Views: 1792

Answers (1)

Fredaroo
Fredaroo

Reputation: 444

While waiting for Rolling rank to be added in pandas 1.4.0, one way to do this could be like so :

import pandas as pd

df[column].rolling(window).apply(lambda x: len(x[x <= x.iloc[-1]]) / len(x))

Where window is the window on which you sought to roll. This should give you the same result as if you were using df[column].rank(pct=True) on a dataframe equal to the window at hand.

The lambda function divides the number of occurences under or equal the last value by the total number of value, giving us the percentage of scores fall below the current row.

Note: Values under the window size will return NaN. For example the first value for a window=2.


Example :

n = [5, 10, 9, 7, 6, 9, 4]
df = pd.DataFrame(n, columns=['Scores'])
df['rank_result'] = df['Scores'].rank(pct=True)
df['rolling_apply'] = df['Scores'].rolling(7).apply(lambda x: len(x[x <= x.iloc[-1]]) / len(x))
print(df[6:7]) # get the last row to compare

This will give the following output for the last row :

   Scores  rank_result  rolling_apply
6       4     0.142857       0.142857

Upvotes: 1

Related Questions