Reputation: 11
I have tried to find the solution to this but havent been able to. So asking. I have a dataset for which I wish to forecast values for future dates for groups within the dataset. The goal would be to fill NANs with rolling mean average of the last 5 days with actuals where actual available and rolling mean where actual is not available.
Sample Data
VALUE EXPECTED
5.0 5
10.0 10
15.0 15
20.0 20
25.0 25
NaN 15
50.0 50
NaN 25
NaN 27
Here is the code that I used to try to get the expected value but end with something weird:
df_grouped_index['RETENTION_FCST_IMPUTED'] = (
df_grouped_index
.sort_values(['INSTALLMENT_KEY', 'PLATFORM_SDESC', 'RELATIVE_DAY_KEY', 'DAY_KEY'])
.groupby(['INSTALLMENT_KEY', 'PLATFORM_SDESC', 'RELATIVE_DAY_KEY'], group_keys=False)
.apply(lambda x: (
x['RETENTION_CALCULATED']
#.fillna(method='ffill') # Forward fill within each group to avoid NaNs in the rolling mean
.rolling(6, min_periods=1, win_type = None , method= 'single')
.mean()
.where(x['RETENTION_CALCULATED'].isnull()) # Only apply to original NaNs
.combine_first(x['RETENTION_CALCULATED'])
))
)
Upvotes: 1
Views: 39
Reputation: 262194
If you want to use the last 5 values (ignoring NaNs), combine rolling.mean
, shift
and fillna
:
df['out'] = df['VALUE'].fillna(df['VALUE'].rolling(5, min_periods=1).mean().shift())
This gives however slightly different values that in your example.
Output:
VALUE EXPECTED out
0 5.0 5 5.000000
1 10.0 10 10.000000
2 15.0 15 15.000000
3 20.0 20 20.000000
4 25.0 25 25.000000
5 NaN 15 15.000000 # (5+10+15+20+25)/5
6 50.0 50 50.000000
7 NaN 25 27.500000 # (15+20+25+50)/4
8 NaN 27 31.666667 # (20+25+50)/3
If you want to include the previously filled values in the computation, you can't vectorize and would need to use numba
:
from numba import jit
@jit(nopython=True)
def fill_avg(a, window=5):
out = a.copy()
for i, x in enumerate(a):
if np.isnan(x):
out[i] = np.mean(out[max(0, i-window):i])
return out
df['out'] = fill_avg(df['VALUE'].to_numpy())
Output:
VALUE EXPECTED out
0 5.0 5 5.0
1 10.0 10 10.0
2 15.0 15 15.0
3 20.0 20 20.0
4 25.0 25 25.0
5 NaN 15 15.0 # (5+10+15+20+25)/5
6 50.0 50 50.0
7 NaN 25 25.0 # (15+20+25+15+50)/5
8 NaN 27 27.0 # (20+25+15+50+25)/5
Upvotes: 1