stevemo
stevemo

Reputation: 1097

Pandas groupby on same column twice

Say I have a dataset like

z = pd.DataFrame({'match': [1,1,2,2,3,3,4,4],
                  'group': [0,1,1,2,0,2,1,2],
                  'win':   [0,1,1,0,0,1,0,1]})

Notice, for example, that group "1" and "2" are paired up twice (in match 2 and 4), and in this matchup, group 2 has a 50% win rate.

I want an array that shows the avg win rate of each group pairing, something like

    0    1    2
0   -   0.0  0.0
1  1.0   -   0.5
2  1.0  0.5   -

but I have no idea how to get there with standard groupby and agg methods.

Upvotes: 0

Views: 559

Answers (1)

akuiper
akuiper

Reputation: 215137

Option one (Updated):

left = z.assign(
  _id = lambda x: x.groupby('match').cumcount()
).pivot('match', '_id',  'group').join(
  z.groupby('match').win.first()
)

# for each match create another row with players reversed and assign 
# the opposite score
right = left.rename(columns={0: 1, 1:0}).assign(win=lambda x: 1-x.win)

left.append(right).groupby([0,1]).win.mean().unstack()

1    0    1    2
0               
0  NaN  0.0  0.0
1  1.0  NaN  0.5
2  1.0  0.5  NaN

Option two (original answer):

  1. Pivot data frame so each row is a match;
  2. The win rate of one player over another is basically the average of its scores where both players are not null, i.e. in a match;
  3. Use apply to calculate all pair wise win rates;
p = z.pivot('match', 'group', 'win')

# calculate the win rate of x over y
def win_rate(x, y):
    if x.name == y.name:
        return None
    return x[x.notnull() & y.notnull()].mean()

# calculate pair wise win rate
p.apply(lambda x: p.apply(lambda y: win_rate(y, x)))

group    0    1    2
group               
0      NaN  0.0  0.0
1      1.0  NaN  0.5
2      1.0  0.5  NaN

Upvotes: 2

Related Questions