Vince Miller
Vince Miller

Reputation: 300

How to create rolling window variables without skipping months when using a multiIndex?

Currently I have a df with a location_key & year_month multiIndex. I want to create a sum using a rolling window for 3 months.

(pd.DataFrame(df.groupby(['LOCATION_KEY','YEAR_MONTH'])['SALES'].count()).sort_index()).groupby(level=(0)).apply(lambda x: x.rolling(window=3).sum())

The window is working properly the issue is that in months where there were no sales instead of counting an empty month instead it counts the another month.

e.g. in the data below, 2016-03 Sales is the sum of 2016-03, 2016-01, 2015-12 as opposed to what I would like: 2016-03, 2016-02, 2016-01.

LOCATION_KE YEAR_MONTH SALES A 2015-10 NaN 2015-11 NaN 2015-12 200 2016-01 220 2016-03 180 B 2015-04 NaN 2015-05 NaN 2015-06 119 2015-07 120

Upvotes: 0

Views: 68

Answers (1)

Kyle
Kyle

Reputation: 2894

Basically you have to get your index set up how you want so the rolling window has zeros to process.

df
  LOCATION_KE YEAR_MONTH  SALES
0           A 2015-10-01    NaN
1           A 2015-11-01    NaN
2           A 2015-12-01  200.0
3           A 2016-01-01  220.0
4           A 2016-03-01  180.0
5           B 2015-04-01    NaN
6           B 2015-05-01    NaN
7           B 2015-06-01  119.0
8           B 2015-07-01  120.0

df['SALES'] = df['SALES'].fillna(0)

df.index = [df["LOCATION_KE"], df["YEAR_MONTH"]]

df
                       LOCATION_KE YEAR_MONTH  SALES
LOCATION_KE YEAR_MONTH
A           2015-10-01           A 2015-10-01    0.0
            2015-11-01           A 2015-11-01    0.0
            2015-12-01           A 2015-12-01  200.0
            2016-01-01           A 2016-01-01  220.0
            2016-03-01           A 2016-03-01  180.0
B           2015-04-01           B 2015-04-01    0.0
            2015-05-01           B 2015-05-01    0.0
            2015-06-01           B 2015-06-01  119.0
            2015-07-01           B 2015-07-01  120.0


df = df.reindex(pd.MultiIndex.from_product([df['LOCATION_KE'], 
                                            pd.date_range("20150101", periods=24, freq='MS')],
                                           names=['location', 'month']))



df['SALES'].fillna(0).reset_index(level=0).groupby('location').rolling(3).sum().fillna(0)
                    location  SALES
location month
A        2015-01-01        A    0.0
         2015-02-01        A    0.0
         2015-03-01        A    0.0
         2015-04-01        A    0.0
         2015-05-01        A    0.0
         2015-06-01        A    0.0
         2015-07-01        A    0.0
         2015-08-01        A    0.0
         2015-09-01        A    0.0
         2015-10-01        A    0.0
         2015-11-01        A    0.0
         2015-12-01        A  200.0
         2016-01-01        A  420.0
         2016-02-01        A  420.0
         2016-03-01        A  400.0
         2016-04-01        A  180.0
         2016-05-01        A  180.0
         2016-06-01        A    0.0
         2016-07-01        A    0.0
         2016-08-01        A    0.0
         2016-09-01        A    0.0
         2016-10-01        A    0.0
         2016-11-01        A    0.0
         2016-12-01        A    0.0
         2015-01-01        A    0.0
         2015-02-01        A    0.0
         2015-03-01        A    0.0
         2015-04-01        A    0.0
         2015-05-01        A    0.0
         2015-06-01        A    0.0
...                      ...    ...
B        2016-07-01        B    0.0
         2016-08-01        B    0.0
         2016-09-01        B    0.0
         2016-10-01        B    0.0
         2016-11-01        B    0.0
         2016-12-01        B    0.0
         2015-01-01        B    0.0
         2015-02-01        B    0.0
         2015-03-01        B    0.0
         2015-04-01        B    0.0
         2015-05-01        B    0.0
         2015-06-01        B  119.0
         2015-07-01        B  239.0
         2015-08-01        B  239.0
         2015-09-01        B  120.0
         2015-10-01        B    0.0
         2015-11-01        B    0.0
         2015-12-01        B    0.0
         2016-01-01        B    0.0
         2016-02-01        B    0.0
         2016-03-01        B    0.0
         2016-04-01        B    0.0
         2016-05-01        B    0.0
         2016-06-01        B    0.0
         2016-07-01        B    0.0
         2016-08-01        B    0.0
         2016-09-01        B    0.0
         2016-10-01        B    0.0
         2016-11-01        B    0.0
         2016-12-01        B    0.0

I think if you have a up to date pandas you can leave out the reset_index.

Upvotes: 1

Related Questions