Reputation: 23
I'm trying to create a column with cumulative sum that resets to zero when the cumsum gets below zero. I the following data:
id | treatment | value |
---|---|---|
1 | drugs | 66 |
1 | drugs | 33 |
1 | drugs | -100 |
1 | drugs | 11 |
1 | drugs | 30 |
1 | drugs | -50 |
The desired result:
id | treatment | days | cumsum |
---|---|---|---|
1 | drugs | 66 | 66 |
1 | drugs | 33 | 99 |
1 | drugs | -100 | 0 |
1 | drugs | 11 | 11 |
1 | drugs | 30 | 41 |
1 | drugs | -50 | 0 |
Is there a solution close to this attempt?
df.groupby(['id','treatment']).days.apply(lambda x: 0 if x.cumsum() < 0 else x.cumsum())
Upvotes: 2
Views: 324
Reputation: 323386
I will recommend use this numba
function this time:
from numba import njit
@njit
def cumli(x, lim):
total = 0
result = []
for i, y in enumerate(x):
total += y
if total < lim:
total = 0
result.append(total)
return result
df['cumsum'] = df.groupby(['id','treatment']).days.transform(lambda x: cumli(x.values,0))
Upvotes: 1
Reputation:
Building off of @BENY's excellent answer here, you can do this:
df['cumsum'] = df.groupby(df['value'].lt(0).astype(int).diff().ne(0).cumsum())['value'].cumsum().clip(lower=0)
Output:
>>> df
id treatment value cumsum
0 1 drugs 66 66
1 1 drugs 33 99
2 1 drugs -100 0
3 1 drugs 11 11
4 1 drugs 30 41
5 1 drugs -50 0
Upvotes: 2