malte
malte

Reputation: 559

Pandas rolling sum with unevenly spaced index

I have a dataframe containing weekly sales for different products (a, b, c). If there were zero sales in a given week (e.g. week 4), there is no record for that week:

In[1]
df = pd.DataFrame({'product': list('aaaabbbbcccc'),
               'week': [1, 2, 3, 5, 1, 2, 3, 5, 1, 2, 3, 4],
               'sales': np.power(2, range(12))})
Out[1]
   product  sales  week
0        a      1     1
1        a      2     2
2        a      4     3
3        a      8     5
4        b     16     1
5        b     32     2
6        b     64     3
7        b    128     5
8        c    256     1
9        c    512     2
10       c   1024     3
11       c   2048     4

I would like to create a new column containing the cumulative sales for the previous n weeks, grouped by product. E.g. for n=2 it should be like last_2_weeks:

   product  sales  week  last_2_weeks
0        a      1     1             0
1        a      2     2             1
2        a      4     3             3
3        a      8     5             4
4        b     16     1             0
5        b     32     2            16
6        b     64     3            48
7        b    128     5            64
8        c    256     1             0
9        c    512     2           256
10       c   1024     3           768
11       c   2048     4          1536

If there was a record for every week, I could just use rolling_sum as described in this question.

Is there a way to set 'week' as an index and only calculate the sum on that index? Or could I resample 'week' and set sales to zero for all missing rows?

Upvotes: 4

Views: 923

Answers (2)

pbarber
pbarber

Reputation: 139

You can use pivot to create a table which will auto-fill the missing values. This works provided that there is at least one entry for each week in your original data, reindex can be used to ensure that there is a row in the table for every week.

This can then have rolling_sum applied to it:

import pandas, numpy
df = pandas.DataFrame({'product': list('aaaabbbbcccc'),
              'week': [1, 2, 3, 5, 1, 2, 3, 5, 1, 2, 3, 4],
               'sales': numpy.power(2, range(12))})
sales = df.pivot(index='week', columns='product')
# Cope with weeks when there were no sales at all
sales = sales.reindex(range(min(sales.index), 1+max(sales.index))).fillna(0)
# Calculate the sum for the preceding two weeks
pandas.rolling_sum(sales, 3, min_periods=1)-sales

This gives the following result, which looks to match the desired (in that it provides the sum for the preceding two weeks):

product     a   b     c
week                   
1           0   0     0
2           1  16   256
3           3  48   768
4           6  96  1536
5           4  64  3072

Upvotes: 0

jezrael
jezrael

Reputation: 862511

Resample is only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex.
But reindex is possible with integers.

Firstly column week is set to index. Then df is grouped by column product and apply reindex by max values of index of each group. Missing values are filled by 0.

import pandas as pd
import numpy as np

df = pd.DataFrame({'product': list('aaaabbbbcccc'),
               'week': [1, 2, 3, 5, 1, 2, 3, 5, 1, 2, 3, 4],
               'sales': np.power(2, range(12))})

df = df.set_index('week')   

def reindex_by_max_index_of_group(df):
    index = range(1, max(df.index) + 1)
    return df.reindex(index, fill_value=0)

df = df.groupby('product').apply(reindex_by_max_index_of_group)

df.drop(['product'], inplace=True, axis=1)
print df.reset_index()

#   product  week  sales
#0        a     1      1
#1        a     2      2
#2        a     3      4
#3        a     4      0
#4        a     5      8
#5        b     1     16
#6        b     2     32
#7        b     3     64
#8        b     4      0
#9        b     5    128
#10       c     1    256
#11       c     2    512
#12       c     3   1024
#13       c     4   2048

Upvotes: 1

Related Questions