Jason Donnald
Jason Donnald

Reputation: 2316

How to do a rolling aggregation of data week wise in python?

I have a dataset whose structure is : Date Profit

A sample of the dataset is:

   Date     Profit
2013-06-21   14
2013-06-22   19
2013-06-23   11
2013-06-24   13
2013-06-25   6
2013-06-26   22
2013-06-27   22
2013-06-28   3
2013-06-29   5
2013-06-30   10
2013-07-01   17
2013-07-02   14
2013-07-03   9
2013-07-04   7

Sample input is :

data = [('2013-06-21',14),
    ('2013-06-22',19),
    ('2013-06-23',11),
    ('2013-06-24',13),
    ('2013-06-25',6),
    ('2013-06-26',22),
    ('2013-06-27',22),
    ('2013-06-28',3),
    ('2013-06-29',5),
    ('2013-06-30',10),
    ('2013-07-01',17),
    ('2013-07-02',14),
    ('2013-07-03',9),
    ('2013-07-04',7)]

Now I want to do a rolling aggregation and store the aggregation. By rolling aggregation I mean that say for week 1 (2013-06-21 to 2013-06-27) I want to add the profit of previous date(s) and store it with current date. So for 2013-06-21 the sum will be 14 only as it is first day of week but then for 2013-06-22 it should be sum of previous date (2013-06-21) and current date (2013-06-22) and this should be stored along with current date. This will go on till end of week and then for next week it will again start over with no previous date for new week. So for first week the sample output should be something like this:

 Date     Profit
2013-06-21   14
2013-06-22   33  #(14 + 19)
2013-06-23   44  #(33 + 11)
2013-06-24   57  #(44 + 13) 
2013-06-25   63  #(57 + 6)
2013-06-26   85  #(63 + 22)
2013-06-27   107 #(85 + 22)

I tried to look at defaultdict and did this:

def aggregate(data, key, value, func):
    measures_dict = collections.defaultdict(list)
    for k,v in zip(data[key], data[value]):
        measures_dict[k].append(v)

return [(k,func(measures_dict[k])) for k in measures_dict.keys()] 

but I am not getting the result and think that defaultdict is not the right way to do it. I also looked at pandas but I am unable to get the way to start to do this. Can anyone help me in doing this rolling aggregation?

Upvotes: 1

Views: 3312

Answers (2)

John
John

Reputation: 180

Just to a small fix in the @liam-foley answer:

df['year'] = df.index.year
df['weekofyear'] = df.index.weekofyear
df['Weekly_Cum'] = df.groupby(['year', 'weekofyear']).cumsum()

Otherwise the cumsum will calculate the same weekday from all years in the index.

Upvotes: 0

Liam Foley
Liam Foley

Reputation: 7822

See this answer: Cumulative sum and percentage on column?

and this: http://pandas.pydata.org/pandas-docs/stable/basics.html#basics-dt-accessors and this: http://pandas.pydata.org/pandas-docs/stable/groupby.html

Updated for Weekly Cumulative:

df = pd.DataFrame(data)
df.columns = ['Date','Profit']
df['Date'] = pd.to_datetime(df['Date'])
df['weekofyear'] = df['Date'].dt.weekofyear
df.reset_index('Date')
df.sort_index(inplace=True)
df['Weekly_Cum'] = df.groupby('weekofyear').cumsum()

Output:

         Date  Profit  weekofyear  Weekly_Cum
0  2013-06-21      14          25          14
1  2013-06-22      19          25          33
2  2013-06-23      11          25          44
3  2013-06-24      13          26          13
4  2013-06-25       6          26          19
5  2013-06-26      22          26          41
6  2013-06-27      22          26          63
7  2013-06-28       3          26          66
8  2013-06-29       5          26          71
9  2013-06-30      10          26          81
10 2013-07-01      17          27          17
11 2013-07-02      14          27          31
12 2013-07-03       9          27          40
13 2013-07-04       7          27          47

Upvotes: 6

Related Questions