iKey
iKey

Reputation: 428

Pandas Groupby - Calculate percentage of values per group total value

I have this Pandas group by statement:

df['teams'].groupby(train_sub['outcome']).value_counts()

which returns something like this :

outcome | teams 
--------|----------------|-----
  win   | Man utd        | 120
        | Chelsea        | 75
        | Arsenal        | 10
--------|----------------|------
  loss  | Man utd        | 30
        | Chelsea        | 75
        | Arsenal        | 150

For each team, I want to show the percentage per outcome of the team's total (not the total entries in the dataframe). So something like this:

outcome | teams 
--------|----------------|-----
  win   | Man utd        | 0.80
        | Chelsea        | 0.5
        | Arsenal        | 0.0625
--------|----------------|------
  loss  | Man utd        | 0.20
        | Chelsea        | 0.5
        | Arsenal        | 0.9375

Please how do I get this outcome?

Upvotes: 0

Views: 481

Answers (1)

Ankur Sinha
Ankur Sinha

Reputation: 6639

Reproducing the dataset like you have:

df = pd.DataFrame()
df['outcome'] = ['win', 'win', 'win', 'loss', 'loss', 'loss']
df['teams'] = ['manu', 'chelsea', 'arsenal', 'manu', 'chelsea', 'arsenal']
df['points'] = [120, 75, 10, 30, 75, 150]
grouped = df.groupby(['outcome', 'teams'])['points'].sum()

My grouped variable now looks like yours.

                 points
outcome teams          
loss    arsenal     150
        chelsea      75
        manu         30
win     arsenal      10
        chelsea      75
        manu        120


Solution:

grouped in your case is result of df['teams'].groupby(train_sub['outcome']).value_counts(). So, just do:

grouped / grouped.groupby(level = 1).sum()

Output:

outcome teams    points     
loss    arsenal  0.9375
        chelsea  0.5000
        manu     0.2000
win     arsenal  0.0625
        chelsea  0.5000
        manu     0.8000

Upvotes: 2

Related Questions