Reputation: 324
I've got success/failure data on several simulations. Each simulation consists of several trials and I want a cumulative sum of the successes per simulation. Here's an example of my data:
data = pd.DataFrame([[0, 0, 0],
[0, 1, 0],
[0, 2, 1],
[0, 3, 0],
[1, 0, 1],
[1, 1, 0],
[1, 2, 0],
[1, 3, 1],
[2, 0, 0],
[2, 1, 1],
[2, 2, 1],
[2, 3, 1],
[0, 0, 0],
[0, 1, 1],
[0, 2, 1],
[0, 3, 0]],
columns=['simulation', 'trial', 'success'])
Using this answer, I came up with the following code but it isn't quite working and I can't figure out why.
cumsum = data['success'].cumsum()
reset = -cumsum[data['trial'] == 0].diff().fillna(cumsum)
data['cumsum'] = data['success'].where(data['trial'] != 0, reset).cumsum()
The resulting column is [0, 0, 1, 1, -1, -1, -1, 0, -1, 0, 1, 2, -1, 0, 1, 1]
but I expect [0, 0, 1, 1, 1, 1, 1, 2, 0, 1, 2, 3, 0, 1, 2, 2]
Upvotes: 0
Views: 436