hernanavella
hernanavella

Reputation: 5552

How to perform cumulative calculations in pandas that restart with each change in date?

This is a simplified version of my data:

    Date and Time           Price   Volume
0   2015-01-01 17:00:00.211 2030.25 342
1   2015-01-01 17:00:02.456 2030.75 203
2   2015-01-02 17:00:00.054 2031.00 182
3   2015-01-02 17:00:25.882 2031.75 249

I would like to calculate cumulative volume per day, so that the end result would be something like:

data['cum_Vol'] = data['Volume'].cumsum()

Output:

    Date and Time           Price   Volume cum_Vol
0   2015-01-01 17:00:00.211 2030.25 342    342
1   2015-01-01 17:00:02.456 2030.75 203    545
2   2015-01-02 17:00:00.054 2031.00 182    182
3   2015-01-02 17:00:25.882 2031.75 249    431

Notice how instead of doing the regular cumsum(), the calculation re-starts when there's a change in Date, in the example from 2015-01-01 to 2015-01-02.

Upvotes: 2

Views: 1385

Answers (1)

Alex Riley
Alex Riley

Reputation: 176850

The easiest way would probably be to set 'Date and Time' as the index and then use groupby with TimeGrouper to group the dates. Then you can apply cumsum():

>>> df2 = df.set_index('Date and Time')
>>> df2['Volume'] = df2.groupby(pd.TimeGrouper('D'))['Volume'].cumsum()
>>> df2
                           Price  Volume
DateandTime                             
2015-01-01 17:00:00.211  2030.25     342
2015-01-01 17:00:02.456  2030.75     545
2015-01-02 17:00:00.054  2031.00     182
2015-01-02 17:00:25.882  2031.75     431

You can always reset the index again afterwards.

Upvotes: 4

Related Questions