Reputation: 5552
This is a simplified version of my data:
Date and Time Price Volume
0 2015-01-01 17:00:00.211 2030.25 342
1 2015-01-01 17:00:02.456 2030.75 203
2 2015-01-02 17:00:00.054 2031.00 182
3 2015-01-02 17:00:25.882 2031.75 249
I would like to calculate cumulative volume per day, so that the end result would be something like:
data['cum_Vol'] = data['Volume'].cumsum()
Output:
Date and Time Price Volume cum_Vol
0 2015-01-01 17:00:00.211 2030.25 342 342
1 2015-01-01 17:00:02.456 2030.75 203 545
2 2015-01-02 17:00:00.054 2031.00 182 182
3 2015-01-02 17:00:25.882 2031.75 249 431
Notice how instead of doing the regular cumsum()
, the calculation re-starts when there's a change in Date, in the example from 2015-01-01 to 2015-01-02.
Upvotes: 2
Views: 1385
Reputation: 176850
The easiest way would probably be to set 'Date and Time' as the index and then use groupby
with TimeGrouper
to group the dates. Then you can apply cumsum()
:
>>> df2 = df.set_index('Date and Time')
>>> df2['Volume'] = df2.groupby(pd.TimeGrouper('D'))['Volume'].cumsum()
>>> df2
Price Volume
DateandTime
2015-01-01 17:00:00.211 2030.25 342
2015-01-01 17:00:02.456 2030.75 545
2015-01-02 17:00:00.054 2031.00 182
2015-01-02 17:00:25.882 2031.75 431
You can always reset the index again afterwards.
Upvotes: 4