Ankhnesmerira
Ankhnesmerira

Reputation: 1430

Rolling mean/sum doesn't skip NAs in Pandas dataframe

I'm trying to get the rolling mean of a column in a Pandas dataframe, and I can't force the mean to skip the NAs. I have the same issue with the rolling sum:

import pandas as pd

test = pd.DataFrame({'A': [1, 2, 3, 10, np.nan, 5 , 20, 6, 7, np.nan, np.nan, np.nan]})
test['rolling_mean'] = test.A.rolling(3, min_periods=3).mean(skipna = True).reset_index(0,drop=True)
test['rolling_sum'] = test.A.rolling(3, min_periods=3).sum(skipna = True).reset_index(0,drop=True)
test


    A   rolling_mean    rolling_sum     rolling_mean_desired    rolling_sum_desired
0   1       NaN             NaN             NaN                 NaN
1   2       NaN             NaN             NaN                 NaN
2   3       2               6               2                   6
3   10      5               15              5                   15
4   NaN     NaN             NaN             6.5                 13
5   5       NaN             NaN             7.5                 15
6   20      NaN             NaN             12.5                25
7   6       10.333333       31              10.33333333         31
8   7       11              33              11                  33
9   Nan     NaN             NaN             6.5                 13
10  Nan     NaN             NaN             7                   7
11  Nan     NaN             NaN             NaN                 NaN

In the example above, I do want the first two entries to be NaN (that's why I defined the min_period the same as my window. But rows 4,5,6,9,10 shouldn't be NAs. Row 11 should be NaN as all three rows are NaNs.

Upvotes: 0

Views: 472

Answers (1)

user16836078
user16836078

Reputation:

As the documentation in the Pandas website said, the min_periods is the minimum number of observations in window required to have a value. So what you need to do is just to make the min_periods as 1, not 3.

test.A.rolling(3,min_periods=1).mean()
Out[120]: 
0      1.000000
1      1.500000
2      2.000000
3      5.000000
4      6.500000
5      7.500000
6     12.500000
7     10.333333
8     11.000000
9      6.500000
10     7.000000
11          NaN
Name: A, dtype: float64

test.A.rolling(3,min_periods=1).sum()
Out[122]: 
0      1.0
1      3.0
2      6.0
3     15.0
4     13.0
5     15.0
6     25.0
7     31.0
8     33.0
9     13.0
10     7.0
11     NaN
Name: A, dtype: float64

And if you want to change the first two row to become NaN, you can manually do it after performing the rolling operation.

test.loc[0:1, ['rolling_mean','rolling_sum']] = np.nan

test
Out[142]: 
       A  rolling_mean  rolling_sum
0    1.0           NaN          NaN
1    2.0           NaN          NaN
2    3.0      2.000000          6.0
3   10.0      5.000000         15.0
4    NaN      6.500000         13.0
5    5.0      7.500000         15.0
6   20.0     12.500000         25.0
7    6.0     10.333333         31.0
8    7.0     11.000000         33.0
9    NaN      6.500000         13.0
10   NaN      7.000000          7.0
11   NaN           NaN          NaN

Upvotes: 2

Related Questions