Pandas cumulative sum reset based on percentage of the last checkpoint

Question

Sample Data

date_rng = pd.date_range('2019-01-01', freq='s', periods=400)
df = pd.DataFrame(np.random.lognormal(.005, .5, size=(len(date_rng))),
                  columns=['data1'],
                  index=date_rng)

Example input

                    data1
2019-01-01 00:00:00 1
2019-01-01 00:00:01 -2
2019-01-01 00:00:02 2
2019-01-01 00:00:03 3
2019-01-01 00:00:04 1
2019-01-01 00:00:05 2
2019-01-01 00:00:06 -1
2019-01-01 00:00:07 3
2019-01-01 00:00:08 4
2019-01-01 00:00:09 5
2019-01-01 00:00:10 7
2019-01-01 00:00:11 2
2019-01-01 00:00:12 4
2019-01-01 00:00:13 -1
2019-01-01 00:00:14 5
2019-01-01 00:00:15 3
2019-01-01 00:00:16 5
2019-01-01 00:00:17 -3
... ...

Expected output

                    data1 cumsum
2019-01-01 00:00:00 1     1
2019-01-01 00:00:01 -2    -1
2019-01-01 00:00:02 2     1
2019-01-01 00:00:03 3     4
2019-01-01 00:00:04 1     5 (reset cumsum at this point)
2019-01-01 00:00:05 2     2
2019-01-01 00:00:06 -1    1
2019-01-01 00:00:07 3     4
2019-01-01 00:00:08 4     8 (reset at this point)
2019-01-01 00:00:09 5     5 (reset at this point)
2019-01-01 00:00:10 7     7 (reset at this point)
2019-01-01 00:00:11 2     2
2019-01-01 00:00:12 4     6 (reset at this point)
2019-01-01 00:00:13 -1    -1
2019-01-01 00:00:14 5     4
2019-01-01 00:00:15 3     7 (reset at this point)
2019-01-01 00:00:16 1     1
2019-01-01 00:00:17 -3    -2
... ...

I want to calculate the cumulative sum of the data1 based on the condtiontion that it will reset if it larger than 5 and exceed 20% of the last reset. For example: for the first calculation, it will only reset if it's larger than 5, after that, it will reset based on both conditions.

I have checked out some other answers on stackoverflow but I didn't find similar problems. Please advice me on how to solve this problem.

Pandas cumulative sum reset based on percentage of the last checkpoint

Answers (1)

Related Questions