Reputation: 5776
I have this data frame
frame = pd.DataFrame({'player1' : ['Joe', 'Steve', 'Bill', 'Doug', 'Steve','Bill','Joe','Steve'],
'player2' : ['Bill', 'Doug', 'Steve', 'Joe', 'Bill', 'Steve', 'Doug', 'Bill'],
'winner' : ['Joe','Steve' , 'Steve','Doug', 'Bill', 'Steve', 'Doug', 'Steve'],
'loser' : ['Bill', 'Doug', 'Bill', 'Joe', 'Steve', 'Bill', 'Joe', 'Bill'],
'ones' : 1})
I can keep a running total of how many times the winner has won by doing this.
frame['winners_wins'] = frame.groupby('winner')['ones'].cumsum()
I would like to keep a running count of player1's wins and losses and the same for player2. I think I should be able to do this with the groupby function but I don't know how to write it.
edit:
I didn't say it very the well the first time. I would like to keep track for each individual player. So the desired output would be:
player1 player2 winner loser player1_wins player2_wins
Joe Bill Joe Bill 1 0
Steve Doug Steve Doug 1 0
Bill Steve Steve Bill 0 2
Doug Joe Doug Joe 1 1
Steve Bill Bill Steve 2 1
Bill Steve Steve Bill 1 3
Joe Doug Doug Joe 1 2
Steve Bill Steve Bill 3 1
Upvotes: 1
Views: 711
Reputation: 879093
It looks like you want a running total of player1's
and player2's
wins. Here is a pretty mundane way to do it which uses Python more than Pandas.
Calculations that require stepping through the rows in sequence and using previous results to calculate the next row tend to not to be conducive to Pandas/Numpy operations -- cumsum
being an exception.
So I don't think there is a slick way to do this using Pandas operations, but I could be wrong.
import pandas as pd
import collections
df = pd.DataFrame({'player1' : ['Joe', 'Steve', 'Bill', 'Doug',
'Steve','Bill','Joe','Steve'], 'player2' : ['Bill',
'Doug', 'Steve', 'Joe', 'Bill', 'Steve', 'Doug', 'Bill'],
'winner' : ['Joe','Steve' , 'Steve','Doug', 'Bill',
'Steve', 'Doug', 'Steve'], 'loser' : ['Bill', 'Doug',
'Bill', 'Joe', 'Steve', 'Bill', 'Joe', 'Bill'], },
columns = ['player1', 'player2', 'winner', 'loser'])
wins = collections.Counter()
def count_wins():
for idx, row in df.iterrows():
wins[row['winner']] += 1
yield wins[row['player1']], wins[row['player2']]
df['player1_wins'], df['player2_wins'] = zip(*list(count_wins()))
print(df)
prints
player1 player2 winner loser player1_wins player2_wins
0 Joe Bill Joe Bill 1 0
1 Steve Doug Steve Doug 1 0
2 Bill Steve Steve Bill 0 2
3 Doug Joe Doug Joe 1 1
4 Steve Bill Bill Steve 2 1
5 Bill Steve Steve Bill 1 3
6 Joe Doug Doug Joe 1 2
7 Steve Bill Steve Bill 4 1
Upvotes: 1
Reputation: 35235
No need for that "ones" column or, really, for grouping.
In [19]: del frame['ones']
In [20]: frame['player1_wins'] = (frame['winner'] == frame['player1']).astype('int').cumsum()
In [21]: frame['player2_wins'] = (frame['winner'] == frame['player2']).astype('int').cumsum()
In [22]: frame
Out[22]:
loser player1 player2 winner player1_wins player2_wins
0 Bill Joe Bill Joe 1 0
1 Doug Steve Doug Steve 2 0
2 Bill Bill Steve Steve 2 1
3 Joe Doug Joe Doug 3 1
4 Steve Steve Bill Bill 3 2
5 Bill Bill Steve Steve 3 3
6 Joe Joe Doug Doug 3 4
7 Bill Steve Bill Steve 4 4
One way to get winners_wins
without resorting to a "ones" columns is this:
In [26]: frame['winners_wins'] = frame.groupby('winner').winner.transform(lambda x: np.arange(1, 1 + len(x))
Upvotes: 1