Reputation: 3
So far, I've only seen questions being asked about how to ignore NANs while doing a rolling mean on a groupby. But my case is the opposite. I want to include the NANs such that if even one of the values in the rolling windows is NAN, I want the resulting rolling mean to be NAN as well.
Input:
grouping value_to_avg
0 1 1.0
1 1 2.0
2 1 3.0
3 1 NaN
4 1 4.0
5 2 5.0
6 2 NaN
7 2 6.0
8 2 7.0
9 2 8.0
Code to create sample input:
data = {'grouping': [1,1,1,1,1,2,2,2,2,2], 'value_to_avg': [1,2,3,np.nan,4,5,np.nan,6,7,8]}
db = pd.DataFrame(data)
Code that I have tried:
db['rolling_mean_actual'] = db.groupby('grouping')['value_to_avg'].transform(lambda s: s.rolling(window=3, center=True, min_periods=1).mean(skipna=False))
Actual vs. expected output:
grouping value_to_avg rolling_mean_actual rolling_mean_expected
0 1 1.0 1.5 1.5
1 1 2.0 2.0 2.0
2 1 3.0 2.5 NaN
3 1 NaN 3.5 NaN
4 1 4.0 4.0 NaN
5 2 5.0 5.0 NaN
6 2 NaN 5.5 NaN
7 2 6.0 6.5 NaN
8 2 7.0 7.0 7.0
9 2 8.0 7.5 7.5
You can see above, using skipna=False inside the mean function does not work as expected and still ignores NANs
Upvotes: 0
Views: 255
Reputation: 2431
import pandas as pd
import numpy as np
df = pd.DataFrame(
{
"grouping": [1, 1, 1, 1, 1, 2, 2, 2, 2, 2],
"value_to_avg": [1, 2, 3, np.nan, 4, 5, np.nan, 6, 7, 8],
}
)
pd.concat(
[
df,
df.groupby("grouping", as_index=False)
.rolling(window=3, center=True, min_periods=0)
.apply(lambda x: x.mean() if ~x.isna().any() else np.NaN).rename(columns={'value_to_avg': 'rolling avg'}),
],
axis=1,
).iloc[:, [0, 1, 3]]
>>>
grouping value_to_avg rolling avg
0 1 1.0 1.5
1 1 2.0 2.0
2 1 3.0 NaN
3 1 NaN NaN
4 1 4.0 NaN
5 2 5.0 NaN
6 2 NaN NaN
7 2 6.0 NaN
8 2 7.0 7.0
9 2 8.0 7.5
Upvotes: 0
Reputation: 863351
For me working custom function with np.mean
with convert values to numpy array:
roll_window = 3
db['rolling_mean_actual'] = (db.groupby('grouping')['value_to_avg']
.transform(lambda s: s.rolling(roll_window,
center=True,
min_periods=1)
.apply(lambda x: np.mean(x.to_numpy())))
You can avoid transform
also:
roll_window = 3
db['rolling_mean_actual'] = (db.groupby('grouping')['value_to_avg']
.rolling(roll_window, center=True, min_periods=1)
.apply(lambda x: np.mean(x.to_numpy()))
.droplevel(0))
print (db)
grouping value_to_avg rolling_mean_actual
0 1 1.0 1.5
1 1 2.0 2.0
2 1 3.0 NaN
3 1 NaN NaN
4 1 4.0 NaN
5 2 5.0 NaN
6 2 NaN NaN
7 2 6.0 NaN
8 2 7.0 7.0
9 2 8.0 7.5
Upvotes: 0