Reputation: 2316
I have a dataset whose structure is : Date Profit
A sample of the dataset is:
Date Profit
2013-06-21 14
2013-06-22 19
2013-06-23 11
2013-06-24 13
2013-06-25 6
2013-06-26 22
2013-06-27 22
2013-06-28 3
2013-06-29 5
2013-06-30 10
2013-07-01 17
2013-07-02 14
2013-07-03 9
2013-07-04 7
Sample input
is :
data = [('2013-06-21',14),
('2013-06-22',19),
('2013-06-23',11),
('2013-06-24',13),
('2013-06-25',6),
('2013-06-26',22),
('2013-06-27',22),
('2013-06-28',3),
('2013-06-29',5),
('2013-06-30',10),
('2013-07-01',17),
('2013-07-02',14),
('2013-07-03',9),
('2013-07-04',7)]
Now I want to do a rolling aggregation
and store the aggregation. By rolling aggregation I mean that say for week 1 (2013-06-21 to 2013-06-27) I want to add the profit of previous date(s) and store it with current date. So for 2013-06-21
the sum will be 14
only as it is first day of week but then for 2013-06-22
it should be sum of previous date (2013-06-21)
and current date (2013-06-22)
and this should be stored along with current date. This will go on till end of week and then for next week it will again start over with no previous date for new week. So for first week the sample output
should be something like this:
Date Profit
2013-06-21 14
2013-06-22 33 #(14 + 19)
2013-06-23 44 #(33 + 11)
2013-06-24 57 #(44 + 13)
2013-06-25 63 #(57 + 6)
2013-06-26 85 #(63 + 22)
2013-06-27 107 #(85 + 22)
I tried to look at defaultdict
and did this:
def aggregate(data, key, value, func):
measures_dict = collections.defaultdict(list)
for k,v in zip(data[key], data[value]):
measures_dict[k].append(v)
return [(k,func(measures_dict[k])) for k in measures_dict.keys()]
but I am not getting the result and think that defaultdict
is not the right way to do it. I also looked at pandas
but I am unable to get the way to start to do this. Can anyone help me in doing this rolling aggregation?
Upvotes: 1
Views: 3312
Reputation: 180
Just to a small fix in the @liam-foley answer:
df['year'] = df.index.year
df['weekofyear'] = df.index.weekofyear
df['Weekly_Cum'] = df.groupby(['year', 'weekofyear']).cumsum()
Otherwise the cumsum will calculate the same weekday from all years in the index.
Upvotes: 0
Reputation: 7822
See this answer: Cumulative sum and percentage on column?
and this: http://pandas.pydata.org/pandas-docs/stable/basics.html#basics-dt-accessors and this: http://pandas.pydata.org/pandas-docs/stable/groupby.html
Updated for Weekly Cumulative:
df = pd.DataFrame(data)
df.columns = ['Date','Profit']
df['Date'] = pd.to_datetime(df['Date'])
df['weekofyear'] = df['Date'].dt.weekofyear
df.reset_index('Date')
df.sort_index(inplace=True)
df['Weekly_Cum'] = df.groupby('weekofyear').cumsum()
Output:
Date Profit weekofyear Weekly_Cum
0 2013-06-21 14 25 14
1 2013-06-22 19 25 33
2 2013-06-23 11 25 44
3 2013-06-24 13 26 13
4 2013-06-25 6 26 19
5 2013-06-26 22 26 41
6 2013-06-27 22 26 63
7 2013-06-28 3 26 66
8 2013-06-29 5 26 71
9 2013-06-30 10 26 81
10 2013-07-01 17 27 17
11 2013-07-02 14 27 31
12 2013-07-03 9 27 40
13 2013-07-04 7 27 47
Upvotes: 6