Reputation: 300
Currently I have a df with a location_key & year_month multiIndex. I want to create a sum using a rolling window for 3 months.
(pd.DataFrame(df.groupby(['LOCATION_KEY','YEAR_MONTH'])['SALES'].count()).sort_index()).groupby(level=(0)).apply(lambda x: x.rolling(window=3).sum())
The window is working properly the issue is that in months where there were no sales instead of counting an empty month instead it counts the another month.
e.g. in the data below, 2016-03 Sales is the sum of 2016-03, 2016-01, 2015-12 as opposed to what I would like: 2016-03, 2016-02, 2016-01.
LOCATION_KE YEAR_MONTH SALES
A 2015-10 NaN
2015-11 NaN
2015-12 200
2016-01 220
2016-03 180
B 2015-04 NaN
2015-05 NaN
2015-06 119
2015-07 120
Upvotes: 0
Views: 68
Reputation: 2894
Basically you have to get your index set up how you want so the rolling window has zeros to process.
df
LOCATION_KE YEAR_MONTH SALES
0 A 2015-10-01 NaN
1 A 2015-11-01 NaN
2 A 2015-12-01 200.0
3 A 2016-01-01 220.0
4 A 2016-03-01 180.0
5 B 2015-04-01 NaN
6 B 2015-05-01 NaN
7 B 2015-06-01 119.0
8 B 2015-07-01 120.0
df['SALES'] = df['SALES'].fillna(0)
df.index = [df["LOCATION_KE"], df["YEAR_MONTH"]]
df
LOCATION_KE YEAR_MONTH SALES
LOCATION_KE YEAR_MONTH
A 2015-10-01 A 2015-10-01 0.0
2015-11-01 A 2015-11-01 0.0
2015-12-01 A 2015-12-01 200.0
2016-01-01 A 2016-01-01 220.0
2016-03-01 A 2016-03-01 180.0
B 2015-04-01 B 2015-04-01 0.0
2015-05-01 B 2015-05-01 0.0
2015-06-01 B 2015-06-01 119.0
2015-07-01 B 2015-07-01 120.0
df = df.reindex(pd.MultiIndex.from_product([df['LOCATION_KE'],
pd.date_range("20150101", periods=24, freq='MS')],
names=['location', 'month']))
df['SALES'].fillna(0).reset_index(level=0).groupby('location').rolling(3).sum().fillna(0)
location SALES
location month
A 2015-01-01 A 0.0
2015-02-01 A 0.0
2015-03-01 A 0.0
2015-04-01 A 0.0
2015-05-01 A 0.0
2015-06-01 A 0.0
2015-07-01 A 0.0
2015-08-01 A 0.0
2015-09-01 A 0.0
2015-10-01 A 0.0
2015-11-01 A 0.0
2015-12-01 A 200.0
2016-01-01 A 420.0
2016-02-01 A 420.0
2016-03-01 A 400.0
2016-04-01 A 180.0
2016-05-01 A 180.0
2016-06-01 A 0.0
2016-07-01 A 0.0
2016-08-01 A 0.0
2016-09-01 A 0.0
2016-10-01 A 0.0
2016-11-01 A 0.0
2016-12-01 A 0.0
2015-01-01 A 0.0
2015-02-01 A 0.0
2015-03-01 A 0.0
2015-04-01 A 0.0
2015-05-01 A 0.0
2015-06-01 A 0.0
... ... ...
B 2016-07-01 B 0.0
2016-08-01 B 0.0
2016-09-01 B 0.0
2016-10-01 B 0.0
2016-11-01 B 0.0
2016-12-01 B 0.0
2015-01-01 B 0.0
2015-02-01 B 0.0
2015-03-01 B 0.0
2015-04-01 B 0.0
2015-05-01 B 0.0
2015-06-01 B 119.0
2015-07-01 B 239.0
2015-08-01 B 239.0
2015-09-01 B 120.0
2015-10-01 B 0.0
2015-11-01 B 0.0
2015-12-01 B 0.0
2016-01-01 B 0.0
2016-02-01 B 0.0
2016-03-01 B 0.0
2016-04-01 B 0.0
2016-05-01 B 0.0
2016-06-01 B 0.0
2016-07-01 B 0.0
2016-08-01 B 0.0
2016-09-01 B 0.0
2016-10-01 B 0.0
2016-11-01 B 0.0
2016-12-01 B 0.0
I think if you have a up to date pandas you can leave out the reset_index.
Upvotes: 1