gabboshow
gabboshow

Reputation: 5559

calculate percentile using rolling window pandas

I create a pandas dataframe as

df = pd.DataFrame(data=[[1],[2],[3],[1],[2],[3],[1],[2],[3]])
df
Out[19]: 
   0
0  1
1  2
2  3
3  1
4  2
5  3
6  1
7  2
8  3

I calculate the 75% percentile on windows of length =3

df.rolling(window=3,center=False).quantile(0.75)
Out[20]: 
     0
0  NaN
1  NaN
2  2.0
3  2.0
4  2.0
5  2.0
6  2.0
7  2.0
8  2.0

then just to check I calculate the 75% on the first window separately

df.iloc[0:3].quantile(0.75)
Out[22]: 
0    2.5
Name: 0.75, dtype: float64

why I get a different value?

Upvotes: 2

Views: 8401

Answers (1)

cs95
cs95

Reputation: 402493

This is a bug, referenced in GH9413 and GH16211.

The reason, as given by the devs -

It looks like the difference here is that quantile and percentile take the weighted average of the nearest points, whereas rolling_quantile simply uses one the nearest point (no averaging).

Rolling.quantile did not interpolate when computing the quantiles.

The bug has been fixed as of 0.21.


For older versions, the fix is using a rolling_apply.

df.rolling(window=3, center=False).apply(lambda x: pd.Series(x).quantile(0.75))

     0
0  NaN
1  NaN
2  2.5
3  2.5
4  2.5
5  2.5
6  2.5
7  2.5
8  2.5

Upvotes: 7

Related Questions