Reputation: 1257
I have a DataFrame that looks like:
f_period f_year f_month subject month year value
20140102 2014 1 a 1 2018 10
20140109 2014 1 a 1 2018 12
20140116 2014 1 a 1 2018 8
20140202 2014 2 a 1 2018 20
20140209 2014 2 a 1 2018 15
20140102 2014 1 b 1 2018 10
20140109 2014 1 b 1 2018 12
20140116 2014 1 b 1 2018 8
20140202 2014 2 b 1 2018 20
20140209 2014 2 b 1 2018 15
The f_period
is the date when a forecast for a SKU (column subject
) was made. The month
and year
column is the period for which the forecast was made. For example, the first row says that on 01/02/2018
, the model was forecasting to set 10
units of product a
in month 1
of year2018
.
I am trying to create a rolling average prediction by subject
, by month
for 2 f_months
. The DataFrame should look like:
f_period f_year f_month subject month year value mnthly_avg rolling_2_avg
20140102 2014 1 a 1 2018 10 10 13
20140109 2014 1 a 1 2018 12 10 13
20140116 2014 1 a 1 2018 8 10 13
20140202 2014 2 a 1 2018 20 17.5 null
20140209 2014 2 a 1 2018 15 17.5 null
20140102 2014 1 b 1 2018 10 10 13
20140109 2014 1 b 1 2018 12 10 13
20140116 2014 1 b 1 2018 8 10 13
20140202 2014 2 b 1 2018 20 17.5 null
20140209 2014 2 b 1 2018 15 17.5 null
Things I tried:
I was able to get mnthly_avg
by :
data_df['monthly_avg'] = data_df.groupby(['f_month', 'f_year', 'year', 'month', 'period', 'subject']).\
value.transform('mean')
I tried getting the rolling_2_avg
:
rolling_monthly_df = data_df[['f_year', 'f_month', 'subject', 'month', 'year', 'value', 'f_period']].\
groupby(['f_year', 'f_month', 'subject', 'month', 'year']).value.mean().reset_index()
rolling_monthly_df['rolling_2_avg'] = rolling_monthly_df.groupby(['subject', 'month']).\
value.rolling(2).mean().reset_index(drop=True)
This gave me an unexpected output. I don't understand how it calculated the values for rolling_2_avg
How do I group by subject
and month
and then sort by f_month
and then take the average of the next two-month average?
Upvotes: 1
Views: 5436
Reputation: 585
I would be a bit careful with Josh's solution. If you want to group by the subject you can't use the rolling function like that as it will roll across subjects (i.e. it will eventually take the mean of a month from subject A and B, rather than giving a null which you might prefer).
An alternative can be to split the dataframe and run the rolling individually (I noticed that you want the nulls by the end of the dataframe, whereas you might wanna sort the dataframe before and after):
for unique_subject in df['subject'].unique():
df_subject = df[df['subject'] == unique_subject]
df_subject['rolling'] = df_subject['value'].rolling(window=2).mean()
print(df_subject) # just to print, you may wanna concatenate these
Upvotes: 1
Reputation: 11657
Unless I'm misunderstanding it seems simpler than what you've done. What about this?
grp = pd.DataFrame(df.groupby(['subject', 'month', 'f_month'])['value'].sum())
grp['rolling'] = grp.rolling(window=2).mean()
grp
Output:
value rolling
subject month f_month
a 1 1 30 NaN
2 35 32.5
b 1 1 30 32.5
2 35 32.5
Upvotes: 2