Pandas rolling sum with unevenly spaced index

Question

I have a dataframe containing weekly sales for different products (a, b, c). If there were zero sales in a given week (e.g. week 4), there is no record for that week:

In[1]
df = pd.DataFrame({'product': list('aaaabbbbcccc'),
               'week': [1, 2, 3, 5, 1, 2, 3, 5, 1, 2, 3, 4],
               'sales': np.power(2, range(12))})
Out[1]
   product  sales  week
0        a      1     1
1        a      2     2
2        a      4     3
3        a      8     5
4        b     16     1
5        b     32     2
6        b     64     3
7        b    128     5
8        c    256     1
9        c    512     2
10       c   1024     3
11       c   2048     4

I would like to create a new column containing the cumulative sales for the previous n weeks, grouped by product. E.g. for n=2 it should be like last_2_weeks:

   product  sales  week  last_2_weeks
0        a      1     1             0
1        a      2     2             1
2        a      4     3             3
3        a      8     5             4
4        b     16     1             0
5        b     32     2            16
6        b     64     3            48
7        b    128     5            64
8        c    256     1             0
9        c    512     2           256
10       c   1024     3           768
11       c   2048     4          1536

If there was a record for every week, I could just use rolling_sum as described in this question.

Is there a way to set 'week' as an index and only calculate the sum on that index? Or could I resample 'week' and set sales to zero for all missing rows?

jezrael · Accepted Answer

Resample is only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex.
But reindex is possible with integers.

Firstly column week is set to index. Then df is grouped by column product and apply reindex by max values of index of each group. Missing values are filled by 0.

import pandas as pd
import numpy as np

df = pd.DataFrame({'product': list('aaaabbbbcccc'),
               'week': [1, 2, 3, 5, 1, 2, 3, 5, 1, 2, 3, 4],
               'sales': np.power(2, range(12))})

df = df.set_index('week')   

def reindex_by_max_index_of_group(df):
    index = range(1, max(df.index) + 1)
    return df.reindex(index, fill_value=0)

df = df.groupby('product').apply(reindex_by_max_index_of_group)

df.drop(['product'], inplace=True, axis=1)
print df.reset_index()

#   product  week  sales
#0        a     1      1
#1        a     2      2
#2        a     3      4
#3        a     4      0
#4        a     5      8
#5        b     1     16
#6        b     2     32
#7        b     3     64
#8        b     4      0
#9        b     5    128
#10       c     1    256
#11       c     2    512
#12       c     3   1024
#13       c     4   2048

Pandas rolling sum with unevenly spaced index

Answers (2)

Related Questions