user11992071
user11992071

Reputation:

Calculating win percentage for individual teams based on pandas df

Have a pandas df like shown below

HomeTeam    AwayTeam    Winner
Warriors    Cavaliers   1
Pistons     Rockets     0
Warriors    Rockets     1
Heat        Warriors    0

The winning team (home or away) is represented by the binary outcome in the Winner column. I want to calculate winning percentage overall for each team. How would I go about doing this?

Expected output would be something like

Team     Win%
Warriors 85%
Heat     22%
....

Upvotes: 3

Views: 1503

Answers (2)

jezrael
jezrael

Reputation: 862681

Use numpy.where for get winner team, counts values by Series.value_counts and divide by counts by all values of both columns with ravel and Series.value_counts, for division is used Series.div, then multiple by 100 and last convert Series to DataFrame by Series.rename_axis with Series.reset_index:

winners = pd.Series(np.where(df['Winner'], df['AwayTeam'], df['HomeTeam'])).value_counts()
all_teams = df[['HomeTeam', 'AwayTeam']].stack().value_counts()

df1 = winners.div(all_teams, fill_value=0).mul(100).rename_axis('Team').reset_index(name='Win%')
print (df1)
       Team   Win%
0  Cavaliers  100.0
1       Heat  100.0
2    Pistons  100.0
3    Rockets   50.0
4   Warriors    0.0

Details:

print (winners)
Rockets      1
Heat         1
Cavaliers    1
Pistons      1
dtype: int64

print (all_teams)
Warriors     3
Rockets      2
Heat         1
Cavaliers    1
Pistons      1
dtype: int64

Upvotes: 2

willeM_ Van Onsem
willeM_ Van Onsem

Reputation: 476659

We can use np.choose to select the winner, and perform .value_counts() on both the winner and both teams, and thus calculate the ratio with:

np.choose(
    df['Winner'], [df['HomeTeam'], df['AwayTeam']]
).value_counts().div(
    df[['HomeTeam', 'AwayTeam']].stack().value_counts()
).fillna(0)

Here we thus use np.choose to select the teams, and perform a value count, next we .stack() the HomeTeam and AwayTeam to obtain a series of values of the teams that played. We can then use .values_counts() to calculate how many times a team played. In case a team never appears at the winning side, that will result in a NaN. We can solve that by using 0 for these values.

With the given sample data, we obtain:

>>> np.choose(df['Winner'], [df['HomeTeam'], df['AwayTeam']]).value_counts().div(df[['HomeTeam', 'AwayTeam']].stack().value_counts()).fillna(0)
Cavaliers    1.0
Heat         1.0
Pistons      1.0
Rockets      0.5
Warriors     0.0
dtype: float64

Here the Cavaliers, Heat and Pistons won all their matches, the Rockets won half of their matches, and the Warriors did not win any match.

Upvotes: 3

Related Questions