Prikers
Prikers

Reputation: 958

Python Pandas rolling functions

I am not sure I understand the parameter min_periods in Pandas rolling functions : why does it have to be smaller than the window parameter? I would like to compute (for instance) the rolling max minus rolling min with a window of ten values BUT I want to wait maybe 20 values before starting computations:

In[1]:  import pandas as pd
In[2]:  import numpy as np
In[3]:  df = pd.DataFrame(columns=['A','B'], data=np.random.randint(low=0,high=100,size=(100,2)))
In[4]:  roll = df['A'].rolling(window=10, min_periods=20)
In[5]:  df['C'] = roll.max() - roll.min()

In[6]:  roll
Out[6]: Rolling [window=10,min_periods=20,center=False,axis=0]

In[7]:  df['C'] = roll.max()-roll.min()

I get the following error:

ValueError: Invalid min_periods size 20 greater than window 10

I thought that min_periods was there to tell how many values the function had to wait before starting computations. The documentation says:

min_periods : int, default None

Minimum number of observations in window required to have a value (otherwise result is NA)

I had not been carefull to the "in window" detail here... Then what would be the most efficient way to achieve what I am trying to achieve? Should I do something like:

roll = df.loc[20:,'A'].rolling(window=10)
df['C'] = roll.max() - roll.min()

Is there a more efficient way?

Upvotes: 3

Views: 9062

Answers (2)

ℕʘʘḆḽḘ
ℕʘʘḆḽḘ

Reputation: 19405

the min_period = n option simply means that you require at least n valid observations to compute your rolling stats.

Example, suppose min_period = 5 and you have a rolling mean over the last 10 observations. Now, what happens if 6 of the last 10 observations are actually missing values? Then, given that 4<5 (indeed, there are only 4 non-missing values here and you require at least 5 non-missing observations), the rolling mean will be missing as well.

It's a very, very important option.

From the documentation

min_periods : int, default None Minimum number of observations in window required to have a value (otherwise result is NA).

Upvotes: 13

Steven G
Steven G

Reputation: 17152

The min period argument is just a way to apply the function to a smaller sample than the rolling window. So let say you want the rolling minimum of window of 10, passing the min period argument of 5 would allow to calculate the min of the first 5 data, then the first 6, then 7,8,9 and finally 10. Now that pandas can start rolling his 10 data point windows, because it has more than 10 data point, it will keep period window of 10.

Upvotes: 4

Related Questions