Reputation: 4529
I'm stacked, and I need the wisdom of the stackoverflow.
I want to create a simple averaging model for sales based on the day of the week from past using pandas.
I mean, the model predicts value of Sales for next Monday based on the last 4 Mondays. Also, if there are no enough weeks in the past, just make an average of existing weeks.
Sample of my time-series df:
weekDay weekNum Sales
Date
2013-01-01 2 1 22
2013-01-02 3 1 33
2013-01-03 4 1 44
2013-01-04 5 1 55
2013-01-05 6 1 66
2013-01-06 7 2 76
2013-01-07 1 2 23
2013-01-08 2 2 55
2013-01-09 3 2 34
2013-01-10 4 2 43
2013-01-11 5 2 34
2013-01-12 6 2 53
2013-01-13 7 3 52
2013-01-14 1 3 41
2013-01-15 2 3 31
2013-01-16 3 3 31
2013-01-17 4 3 42
2013-01-18 5 3 23
2013-01-19 6 3 41
2013-01-20 7 4 31
2013-01-21 1 4 31
2013-01-22 2 4 31
2013-01-23 3 4 43
2013-01-24 4 4 53
2013-01-25 5 4 32
2013-01-26 6 4 12
2013-01-27 7 5 41
2013-01-28 1 5 41
2013-01-29 2 5 12
2013-01-30 3 5 76
2013-01-31 4 5 43
2013-02-01 5 5 32
2013-02-02 6 5 54
2013-02-03 7 6 43
2013-02-04 1 6 43
2013-02-05 2 6 12
2013-02-06 3 6 12
2013-02-07 4 6 43
2013-02-08 5 6 22
2013-02-09 6 6 12
What I've tried:
ts.resample('D').rolling(28).mean()["Sales"]
ts.resample('W-MON').rolling(1).mean()["Sales"]
but it didn't help.
Upvotes: 4
Views: 4281
Reputation: 1200
Ok, Sorry, I think I understand now what you are looking for:
res = df.join(df.groupby('weekDay').rolling('28d').mean(), rsuffix='_avg')
so now res
contains:
weekDay weekNum Sales weekDay_avg weekNum_avg Sales_avg weekDay date 1 2013-01-07 1 2 23 1.0 2.0 23.000000 2013-01-14 1 3 41 1.0 2.5 32.000000 2013-01-21 1 4 31 1.0 3.0 31.666667 2013-01-28 1 5 41 1.0 3.5 34.000000 2013-02-04 1 6 43 1.0 4.5 39.000000 2 2013-01-01 2 1 22 2.0 1.0 22.000000 2013-01-08 2 2 55 2.0 1.5 38.500000 2013-01-15 2 3 31 2.0 2.0 36.000000 2013-01-22 2 4 31 2.0 2.5 34.750000 2013-01-29 2 5 12 2.0 3.5 32.250000 2013-02-05 2 6 12 2.0 4.5 21.500000 3 2013-01-02 3 1 33 3.0 1.0 33.000000 2013-01-09 3 2 34 3.0 1.5 33.500000 2013-01-16 3 3 31 3.0 2.0 32.666667 2013-01-23 3 4 43 3.0 2.5 35.250000 2013-01-30 3 5 76 3.0 3.5 46.000000 2013-02-06 3 6 12 3.0 4.5 40.500000 4 2013-01-03 4 1 44 4.0 1.0 44.000000 2013-01-10 4 2 43 4.0 1.5 43.500000 2013-01-17 4 3 42 4.0 2.0 43.000000 2013-01-24 4 4 53 4.0 2.5 45.500000 2013-01-31 4 5 43 4.0 3.5 45.250000 2013-02-07 4 6 43 4.0 4.5 45.250000 5 2013-01-04 5 1 55 5.0 1.0 55.000000 2013-01-11 5 2 34 5.0 1.5 44.500000 2013-01-18 5 3 23 5.0 2.0 37.333333 2013-01-25 5 4 32 5.0 2.5 36.000000 2013-02-01 5 5 32 5.0 3.5 30.250000 2013-02-08 5 6 22 5.0 4.5 27.250000 6 2013-01-05 6 1 66 6.0 1.0 66.000000 2013-01-12 6 2 53 6.0 1.5 59.500000 2013-01-19 6 3 41 6.0 2.0 53.333333 2013-01-26 6 4 12 6.0 2.5 43.000000 2013-02-02 6 5 54 6.0 3.5 40.000000 2013-02-09 6 6 12 6.0 4.5 29.750000 7 2013-01-06 7 2 76 7.0 2.0 76.000000 2013-01-13 7 3 52 7.0 2.5 64.000000 2013-01-20 7 4 31 7.0 3.0 53.000000 2013-01-27 7 5 41 7.0 3.5 50.000000 2013-02-03 7 6 43 7.0 4.5 41.750000
and in order to make it similar to your good ol' DataFrame that we began with:
res.drop(['weekDay', 'weekDay_avg', 'weekNum_avg'], axis=1, inplace=True)
res.reset_index(inplace=True)
res.set_index('date', inplace=True)
res.sort_index(inplace=True)
so now it looks like this:
weekDay weekNum Sales Sales_avg date 2013-01-01 2 1 22 22.000000 2013-01-02 3 1 33 33.000000 2013-01-03 4 1 44 44.000000 2013-01-04 5 1 55 55.000000 2013-01-05 6 1 66 66.000000 2013-01-06 7 2 76 76.000000 2013-01-07 1 2 23 23.000000 2013-01-08 2 2 55 38.500000 2013-01-09 3 2 34 33.500000 2013-01-10 4 2 43 43.500000 2013-01-11 5 2 34 44.500000 2013-01-12 6 2 53 59.500000 2013-01-13 7 3 52 64.000000 2013-01-14 1 3 41 32.000000 2013-01-15 2 3 31 36.000000 2013-01-16 3 3 31 32.666667 2013-01-17 4 3 42 43.000000 2013-01-18 5 3 23 37.333333 2013-01-19 6 3 41 53.333333 2013-01-20 7 4 31 53.000000 2013-01-21 1 4 31 31.666667 2013-01-22 2 4 31 34.750000 2013-01-23 3 4 43 35.250000 2013-01-24 4 4 53 45.500000 2013-01-25 5 4 32 36.000000 2013-01-26 6 4 12 43.000000 2013-01-27 7 5 41 50.000000 2013-01-28 1 5 41 34.000000 2013-01-29 2 5 12 32.250000 2013-01-30 3 5 76 46.000000 2013-01-31 4 5 43 45.250000 2013-02-01 5 5 32 30.250000 2013-02-02 6 5 54 40.000000 2013-02-03 7 6 43 41.750000 2013-02-04 1 6 43 39.000000 2013-02-05 2 6 12 21.500000 2013-02-06 3 6 12 40.500000 2013-02-07 4 6 43 45.250000 2013-02-08 5 6 22 27.250000 2013-02-09 6 6 12 29.750000
Hope this time this really helps (:
Upvotes: 3