Rolling mean on a groupby with min_periods=1, but NOT ignoring NANs

So far, I've only seen questions being asked about how to ignore NANs while doing a rolling mean on a groupby. But my case is the opposite. I want to include the NANs such that if even one of the values in the rolling windows is NAN, I want the resulting rolling mean to be NAN as well.

Input:

    grouping    value_to_avg
0   1           1.0
1   1           2.0
2   1           3.0
3   1           NaN
4   1           4.0
5   2           5.0
6   2           NaN
7   2           6.0
8   2           7.0
9   2           8.0

Code to create sample input:

data = {'grouping': [1,1,1,1,1,2,2,2,2,2], 'value_to_avg': [1,2,3,np.nan,4,5,np.nan,6,7,8]}  
db = pd.DataFrame(data)

Code that I have tried:

db['rolling_mean_actual'] = db.groupby('grouping')['value_to_avg'].transform(lambda s: s.rolling(window=3, center=True, min_periods=1).mean(skipna=False))

Actual vs. expected output:

  grouping  value_to_avg    rolling_mean_actual rolling_mean_expected
0   1       1.0             1.5                 1.5
1   1       2.0             2.0                 2.0
2   1       3.0             2.5                 NaN
3   1       NaN             3.5                 NaN
4   1       4.0             4.0                 NaN
5   2       5.0             5.0                 NaN
6   2       NaN             5.5                 NaN
7   2       6.0             6.5                 NaN
8   2       7.0             7.0                 7.0
9   2       8.0             7.5                 7.5

You can see above, using skipna=False inside the mean function does not work as expected and still ignores NANs

Upvotes: 0

Answers (2)

nocibambi

Reputation: 2431

import pandas as pd
import numpy as np


df = pd.DataFrame(
    {
        "grouping": [1, 1, 1, 1, 1, 2, 2, 2, 2, 2],
        "value_to_avg": [1, 2, 3, np.nan, 4, 5, np.nan, 6, 7, 8],
    }
)

pd.concat(
    [
        df,
        df.groupby("grouping", as_index=False)
        .rolling(window=3, center=True, min_periods=0)
        .apply(lambda x: x.mean() if ~x.isna().any() else np.NaN).rename(columns={'value_to_avg': 'rolling avg'}),
    ],
    axis=1,
).iloc[:, [0, 1, 3]]

>>>
    grouping    value_to_avg    rolling avg
0   1           1.0             1.5
1   1           2.0             2.0
2   1           3.0             NaN
3   1           NaN             NaN
4   1           4.0             NaN
5   2           5.0             NaN
6   2           NaN             NaN
7   2           6.0             NaN
8   2           7.0             7.0
9   2           8.0             7.5

Upvotes: 0

jezrael

Reputation: 863351

For me working custom function with np.mean with convert values to numpy array:

roll_window = 3

db['rolling_mean_actual'] = (db.groupby('grouping')['value_to_avg']
                               .transform(lambda s: s.rolling(roll_window, 
                                                              center=True, 
                                                              min_periods=1)
                               .apply(lambda x: np.mean(x.to_numpy())))

You can avoid transform also:

roll_window = 3

db['rolling_mean_actual'] = (db.groupby('grouping')['value_to_avg']
                               .rolling(roll_window, center=True, min_periods=1)
                               .apply(lambda x: np.mean(x.to_numpy()))
                               .droplevel(0))
print (db)

   grouping  value_to_avg  rolling_mean_actual
0         1           1.0                  1.5
1         1           2.0                  2.0
2         1           3.0                  NaN
3         1           NaN                  NaN
4         1           4.0                  NaN
5         2           5.0                  NaN
6         2           NaN                  NaN
7         2           6.0                  NaN
8         2           7.0                  7.0
9         2           8.0                  7.5

Upvotes: 0

Rolling mean on a groupby with min_periods=1, but NOT ignoring NANs

Answers (2)

Related Questions