Reputation: 428
I have this Pandas group by statement:
df['teams'].groupby(train_sub['outcome']).value_counts()
which returns something like this :
outcome | teams
--------|----------------|-----
win | Man utd | 120
| Chelsea | 75
| Arsenal | 10
--------|----------------|------
loss | Man utd | 30
| Chelsea | 75
| Arsenal | 150
For each team, I want to show the percentage per outcome of the team's total (not the total entries in the dataframe). So something like this:
outcome | teams
--------|----------------|-----
win | Man utd | 0.80
| Chelsea | 0.5
| Arsenal | 0.0625
--------|----------------|------
loss | Man utd | 0.20
| Chelsea | 0.5
| Arsenal | 0.9375
Please how do I get this outcome?
Upvotes: 0
Views: 481
Reputation: 6639
Reproducing the dataset like you have:
df = pd.DataFrame()
df['outcome'] = ['win', 'win', 'win', 'loss', 'loss', 'loss']
df['teams'] = ['manu', 'chelsea', 'arsenal', 'manu', 'chelsea', 'arsenal']
df['points'] = [120, 75, 10, 30, 75, 150]
grouped = df.groupby(['outcome', 'teams'])['points'].sum()
My grouped
variable now looks like yours.
points
outcome teams
loss arsenal 150
chelsea 75
manu 30
win arsenal 10
chelsea 75
manu 120
grouped
in your case is result of df['teams'].groupby(train_sub['outcome']).value_counts()
. So, just do:
grouped / grouped.groupby(level = 1).sum()
Output:
outcome teams points
loss arsenal 0.9375
chelsea 0.5000
manu 0.2000
win arsenal 0.0625
chelsea 0.5000
manu 0.8000
Upvotes: 2