Reputation: 195
I am calculating the value for the Total ‘1st’ Position column (table below) and would like to do this using multiple conditions.
I want Total ‘1st’ Position to reflect the number of times a given athlete has won a race (as of a given day).
For example... see below that Steve's Total 1st Position increments by 1 when: Athlete = Steve and Position = 1. I want to do this for all athletes.
I have tried the following...
df['Total 1st Position'] = ((df['Position'] == '1') & (df['Athlete'] == df['Athlete'])).cumsum()
...but this only returns a running sum of the number of times df['Position'] == '1'
What am I doing wrong?
Upvotes: 2
Views: 6105
Reputation: 1215
You can do it this way:
df = your_file
df.loc[(df['Position'] == 1), 'firsts'] = 1
df=df.fillna(0)
df['Total 1st Position'] = (df['firsts']*df['Position']).groupby(df['Athlete']).cumsum()
If we run your data frame through this we get the following:
Race Day Athlete Position firsts Total 1st Position
0 Day 1 Steve 1 1.0 1.0
1 Day 1 Jane 2 0.0 0.0
2 Day 1 Bill 3 0.0 0.0
3 Day 2 Bill 1 1.0 1.0
4 Day 2 Steve 2 0.0 1.0
5 Day 2 Jane 3 0.0 0.0
6 Day 3 Jane 1 1.0 1.0
7 Day 3 Bill 2 0.0 1.0
8 Day 3 Steve 3 0.0 1.0
9 Day 4 Steve 1 1.0 2.0
10 Day 4 Jane 2 0.0 1.0
11 Day 4 Bill 3 0.0 1.0
Upvotes: 3
Reputation: 3130
Create a temporary column to indicate wins, then use .groupby
with .cumsum
on that:
df['won'] = (df['Position'] == '1') * 1
df['Total 1st Position'] = df.groupby('Athlete').won.cumsum()
Upvotes: 2