How to do a rolling aggregation of data week wise in python?

Question

I have a dataset whose structure is : Date Profit

A sample of the dataset is:

   Date     Profit
2013-06-21   14
2013-06-22   19
2013-06-23   11
2013-06-24   13
2013-06-25   6
2013-06-26   22
2013-06-27   22
2013-06-28   3
2013-06-29   5
2013-06-30   10
2013-07-01   17
2013-07-02   14
2013-07-03   9
2013-07-04   7

Sample input is :

data = [('2013-06-21',14),
    ('2013-06-22',19),
    ('2013-06-23',11),
    ('2013-06-24',13),
    ('2013-06-25',6),
    ('2013-06-26',22),
    ('2013-06-27',22),
    ('2013-06-28',3),
    ('2013-06-29',5),
    ('2013-06-30',10),
    ('2013-07-01',17),
    ('2013-07-02',14),
    ('2013-07-03',9),
    ('2013-07-04',7)]

Now I want to do a rolling aggregation and store the aggregation. By rolling aggregation I mean that say for week 1 (2013-06-21 to 2013-06-27) I want to add the profit of previous date(s) and store it with current date. So for 2013-06-21 the sum will be 14 only as it is first day of week but then for 2013-06-22 it should be sum of previous date (2013-06-21) and current date (2013-06-22) and this should be stored along with current date. This will go on till end of week and then for next week it will again start over with no previous date for new week. So for first week the sample output should be something like this:

 Date     Profit
2013-06-21   14
2013-06-22   33  #(14 + 19)
2013-06-23   44  #(33 + 11)
2013-06-24   57  #(44 + 13) 
2013-06-25   63  #(57 + 6)
2013-06-26   85  #(63 + 22)
2013-06-27   107 #(85 + 22)

I tried to look at defaultdict and did this:

def aggregate(data, key, value, func):
    measures_dict = collections.defaultdict(list)
    for k,v in zip(data[key], data[value]):
        measures_dict[k].append(v)

return [(k,func(measures_dict[k])) for k in measures_dict.keys()]

but I am not getting the result and think that defaultdict is not the right way to do it. I also looked at pandas but I am unable to get the way to start to do this. Can anyone help me in doing this rolling aggregation?

Liam Foley · Accepted Answer

See this answer: Cumulative sum and percentage on column?

and this: http://pandas.pydata.org/pandas-docs/stable/basics.html#basics-dt-accessors and this: http://pandas.pydata.org/pandas-docs/stable/groupby.html

Updated for Weekly Cumulative:

df = pd.DataFrame(data)
df.columns = ['Date','Profit']
df['Date'] = pd.to_datetime(df['Date'])
df['weekofyear'] = df['Date'].dt.weekofyear
df.reset_index('Date')
df.sort_index(inplace=True)
df['Weekly_Cum'] = df.groupby('weekofyear').cumsum()

Output:

         Date  Profit  weekofyear  Weekly_Cum
0  2013-06-21      14          25          14
1  2013-06-22      19          25          33
2  2013-06-23      11          25          44
3  2013-06-24      13          26          13
4  2013-06-25       6          26          19
5  2013-06-26      22          26          41
6  2013-06-27      22          26          63
7  2013-06-28       3          26          66
8  2013-06-29       5          26          71
9  2013-06-30      10          26          81
10 2013-07-01      17          27          17
11 2013-07-02      14          27          31
12 2013-07-03       9          27          40
13 2013-07-04       7          27          47

How to do a rolling aggregation of data week wise in python?

Answers (2)

Related Questions