user2333196
user2333196

Reputation: 5776

pandas groupby values in different column

I have this data frame

frame =  pd.DataFrame({'player1' : ['Joe', 'Steve', 'Bill', 'Doug', 'Steve','Bill','Joe','Steve'],
                      'player2' : ['Bill', 'Doug', 'Steve', 'Joe', 'Bill', 'Steve', 'Doug', 'Bill'],
                      'winner' : ['Joe','Steve' , 'Steve','Doug', 'Bill', 'Steve', 'Doug', 'Steve'],
                      'loser' : ['Bill', 'Doug', 'Bill', 'Joe', 'Steve', 'Bill', 'Joe', 'Bill'],
                       'ones' : 1})

I can keep a running total of how many times the winner has won by doing this.

frame['winners_wins'] = frame.groupby('winner')['ones'].cumsum()

I would like to keep a running count of player1's wins and losses and the same for player2. I think I should be able to do this with the groupby function but I don't know how to write it.

edit:

I didn't say it very the well the first time. I would like to keep track for each individual player. So the desired output would be:

player1 player2 winner  loser   player1_wins  player2_wins
 Joe     Bill     Joe    Bill       1             0
 Steve   Doug     Steve  Doug       1             0
 Bill    Steve    Steve  Bill       0             2
 Doug    Joe      Doug    Joe       1             1
 Steve   Bill     Bill    Steve     2             1 
 Bill    Steve    Steve   Bill      1             3
 Joe     Doug     Doug    Joe       1             2   
 Steve   Bill     Steve   Bill      3             1

Upvotes: 1

Views: 711

Answers (2)

unutbu
unutbu

Reputation: 879093

It looks like you want a running total of player1's and player2's wins. Here is a pretty mundane way to do it which uses Python more than Pandas.

Calculations that require stepping through the rows in sequence and using previous results to calculate the next row tend to not to be conducive to Pandas/Numpy operations -- cumsum being an exception. So I don't think there is a slick way to do this using Pandas operations, but I could be wrong.

import pandas as pd
import collections

df = pd.DataFrame({'player1' : ['Joe', 'Steve', 'Bill', 'Doug',
                      'Steve','Bill','Joe','Steve'], 'player2' : ['Bill',
                      'Doug', 'Steve', 'Joe', 'Bill', 'Steve', 'Doug', 'Bill'],
                      'winner' : ['Joe','Steve' , 'Steve','Doug', 'Bill',
                      'Steve', 'Doug', 'Steve'], 'loser' : ['Bill', 'Doug',
                      'Bill', 'Joe', 'Steve', 'Bill', 'Joe', 'Bill'], },
                  columns = ['player1', 'player2', 'winner', 'loser'])

wins = collections.Counter()
def count_wins():
    for idx, row in df.iterrows():
        wins[row['winner']] += 1
        yield wins[row['player1']], wins[row['player2']]
df['player1_wins'], df['player2_wins'] = zip(*list(count_wins()))
print(df)

prints

  player1 player2 winner  loser  player1_wins  player2_wins
0     Joe    Bill    Joe   Bill             1             0
1   Steve    Doug  Steve   Doug             1             0
2    Bill   Steve  Steve   Bill             0             2
3    Doug     Joe   Doug    Joe             1             1
4   Steve    Bill   Bill  Steve             2             1
5    Bill   Steve  Steve   Bill             1             3
6     Joe    Doug   Doug    Joe             1             2
7   Steve    Bill  Steve   Bill             4             1

Upvotes: 1

Dan Allan
Dan Allan

Reputation: 35235

No need for that "ones" column or, really, for grouping.

In [19]: del frame['ones']

In [20]: frame['player1_wins'] = (frame['winner'] == frame['player1']).astype('int').cumsum()

In [21]: frame['player2_wins'] = (frame['winner'] == frame['player2']).astype('int').cumsum()

In [22]: frame
Out[22]: 
   loser player1 player2 winner  player1_wins  player2_wins
0   Bill     Joe    Bill    Joe             1             0
1   Doug   Steve    Doug  Steve             2             0
2   Bill    Bill   Steve  Steve             2             1
3    Joe    Doug     Joe   Doug             3             1
4  Steve   Steve    Bill   Bill             3             2
5   Bill    Bill   Steve  Steve             3             3
6    Joe     Joe    Doug   Doug             3             4
7   Bill   Steve    Bill  Steve             4             4

One way to get winners_wins without resorting to a "ones" columns is this:

In [26]: frame['winners_wins'] = frame.groupby('winner').winner.transform(lambda x: np.arange(1, 1 + len(x))

Upvotes: 1

Related Questions