Dongs14173
Dongs14173

Reputation: 195

Python Pandas: Cumulative Sum based on multiple conditions

I am calculating the value for the Total ‘1st’ Position column (table below) and would like to do this using multiple conditions.

I want Total ‘1st’ Position to reflect the number of times a given athlete has won a race (as of a given day).

For example... see below that Steve's Total 1st Position increments by 1 when: Athlete = Steve and Position = 1. I want to do this for all athletes.

enter image description here

I have tried the following...

df['Total 1st Position'] = ((df['Position'] == '1') & (df['Athlete'] == df['Athlete'])).cumsum()

...but this only returns a running sum of the number of times df['Position'] == '1'

What am I doing wrong?

Upvotes: 2

Views: 6105

Answers (2)

Paula Livingstone
Paula Livingstone

Reputation: 1215

You can do it this way:

df = your_file

df.loc[(df['Position'] == 1), 'firsts'] = 1
df=df.fillna(0)

df['Total 1st Position'] = (df['firsts']*df['Position']).groupby(df['Athlete']).cumsum()

If we run your data frame through this we get the following:

   Race Day Athlete  Position  firsts  Total 1st Position
0     Day 1   Steve         1     1.0                 1.0
1     Day 1    Jane         2     0.0                 0.0
2     Day 1    Bill         3     0.0                 0.0
3     Day 2    Bill         1     1.0                 1.0
4     Day 2   Steve         2     0.0                 1.0
5     Day 2    Jane         3     0.0                 0.0
6     Day 3    Jane         1     1.0                 1.0
7     Day 3    Bill         2     0.0                 1.0
8     Day 3   Steve         3     0.0                 1.0
9     Day 4   Steve         1     1.0                 2.0
10    Day 4    Jane         2     0.0                 1.0
11    Day 4    Bill         3     0.0                 1.0

Upvotes: 3

Ken Wei
Ken Wei

Reputation: 3130

Create a temporary column to indicate wins, then use .groupby with .cumsum on that:

df['won'] = (df['Position'] == '1') * 1
df['Total 1st Position'] = df.groupby('Athlete').won.cumsum()

Upvotes: 2

Related Questions