Fill Nans with rolling mean which is a combination of actual data + calculated rolling mean

Question

I have tried to find the solution to this but havent been able to. So asking. I have a dataset for which I wish to forecast values for future dates for groups within the dataset. The goal would be to fill NANs with rolling mean average of the last 5 days with actuals where actual available and rolling mean where actual is not available.

Sample Data

enter image description here

 VALUE  EXPECTED
   5.0         5
  10.0        10
  15.0        15
  20.0        20
  25.0        25
   NaN        15
  50.0        50
   NaN        25
   NaN        27

Here is the code that I used to try to get the expected value but end with something weird:

df_grouped_index['RETENTION_FCST_IMPUTED'] = (
    df_grouped_index
    .sort_values(['INSTALLMENT_KEY', 'PLATFORM_SDESC', 'RELATIVE_DAY_KEY', 'DAY_KEY'])
    .groupby(['INSTALLMENT_KEY', 'PLATFORM_SDESC', 'RELATIVE_DAY_KEY'], group_keys=False)
    .apply(lambda x: (
        x['RETENTION_CALCULATED']
        #.fillna(method='ffill')  # Forward fill within each group to avoid NaNs in the rolling mean
        .rolling(6, min_periods=1, win_type = None , method= 'single') 
        .mean()
        .where(x['RETENTION_CALCULATED'].isnull())  # Only apply to original NaNs
        .combine_first(x['RETENTION_CALCULATED'])
    ))
)

enter image description here

Fill Nans with rolling mean which is a combination of actual data + calculated rolling mean

Answers (1)

Related Questions