Reputation:
Have a pandas df like shown below
HomeTeam AwayTeam Winner
Warriors Cavaliers 1
Pistons Rockets 0
Warriors Rockets 1
Heat Warriors 0
The winning team (home or away) is represented by the binary outcome in the Winner column. I want to calculate winning percentage overall for each team. How would I go about doing this?
Expected output would be something like
Team Win%
Warriors 85%
Heat 22%
....
Upvotes: 3
Views: 1503
Reputation: 862681
Use numpy.where
for get winner team, counts values by Series.value_counts
and divide by counts by all values of both columns with ravel
and Series.value_counts
, for division is used Series.div
, then multiple by 100
and last convert Series
to DataFrame
by Series.rename_axis
with Series.reset_index
:
winners = pd.Series(np.where(df['Winner'], df['AwayTeam'], df['HomeTeam'])).value_counts()
all_teams = df[['HomeTeam', 'AwayTeam']].stack().value_counts()
df1 = winners.div(all_teams, fill_value=0).mul(100).rename_axis('Team').reset_index(name='Win%')
print (df1)
Team Win%
0 Cavaliers 100.0
1 Heat 100.0
2 Pistons 100.0
3 Rockets 50.0
4 Warriors 0.0
Details:
print (winners)
Rockets 1
Heat 1
Cavaliers 1
Pistons 1
dtype: int64
print (all_teams)
Warriors 3
Rockets 2
Heat 1
Cavaliers 1
Pistons 1
dtype: int64
Upvotes: 2
Reputation: 476659
We can use np.choose
to select the winner, and perform .value_counts()
on both the winner and both teams, and thus calculate the ratio with:
np.choose(
df['Winner'], [df['HomeTeam'], df['AwayTeam']]
).value_counts().div(
df[['HomeTeam', 'AwayTeam']].stack().value_counts()
).fillna(0)
Here we thus use np.choose
to select the teams, and perform a value count, next we .stack()
the HomeTeam
and AwayTeam
to obtain a series of values of the teams that played. We can then use .values_counts()
to calculate how many times a team played. In case a team never appears at the winning side, that will result in a NaN
. We can solve that by using 0
for these values.
With the given sample data, we obtain:
>>> np.choose(df['Winner'], [df['HomeTeam'], df['AwayTeam']]).value_counts().div(df[['HomeTeam', 'AwayTeam']].stack().value_counts()).fillna(0)
Cavaliers 1.0
Heat 1.0
Pistons 1.0
Rockets 0.5
Warriors 0.0
dtype: float64
Here the Cavaliers
, Heat
and Pistons
won all their matches, the Rocket
s won half of their matches, and the Warrior
s did not win any match.
Upvotes: 3