luminare
luminare

Reputation: 394

Get percentages of a column based off of another column but with different categories

I have the following Pandas Series:

                    Count
Pclass  Survived    
1       0            80
        1           136
2       0            97
        1            87
3       0           372
        1           119

But I want something like this:

                    Count   Percentage
Pclass  Survived       
1       0           80      37.0
        1           136     63.0
2       0           97      52.7
        1           87      47.3
3       0           372     75.8
        1           119     24.2

I want to calculate the percentages based on only the counts in each category of Pclass (not the whole sum of counts). It would be great if these percentages are calculated using only the Count column.

So far, what I did was I summed up the counts according to Pclass and used .repeat() to duplicate the values and tried to concatenate it to the original Series, which failed spectacularly.

        Count
Pclass  
1       216
1       216
2       184
2       184
3       491
3       491

My idea was to use this repeat column as the denominators in calculating the percentages, like this:

80 / 216 * 100 = 37.0%

and then remove the repeat column after the percentages are calculated. Seems so simple but I can't seem to make it work. Any help is appreciated.

Upvotes: 5

Views: 2259

Answers (3)

piRSquared
piRSquared

Reputation: 294488

div and sum with level=0

df.assign(Pct=df.div(df.sum(level=0), level=0).round(2) * 100)

                 Count   Pct
Pclass Survived             
1      0            80  37.0
       1           136  63.0
2      0            97  53.0
       1            87  47.0
3      0           372  76.0
       1           119  24.0

Upvotes: 6

ipramusinto
ipramusinto

Reputation: 2668

You are working on titanic dataset. You can also do like below on the original data (not on your Series above):

df.Survived.groupby(df.Pclass).value_counts(normalize=True)*100

It will give you count of each class in percentage rather than in actual number.

Upvotes: 0

jezrael
jezrael

Reputation: 863246

Use GroupBy.transform for return Series with same size as original DataFrame, divide by div, multiple by mul and if necessary round:

#if input is Series create one column DataFrame
df = s.to_frame('Count')

s = df.groupby('Pclass')['Count'].transform('sum')
#if use oldier pandas version
#s = df.groupby(level='Pclass')['Count'].transform('sum')

df['Percentage'] = df['Count'].div(s).mul(100).round()
print (df)
                 Count  Percentage
Pclass Survived                   
1      0            80        37.0
       1           136        63.0
2      0            97        53.0
       1            87        47.0
3      0           372        76.0
       1           119        24.0

EDIT:

dont you have to forward fill on PClass ?

It is not necessary, because default in MultiIndex are not shown, but if set multi_sparse to False is possible verify repeating values:

with pd.option_context('display.multi_sparse', False):
    print (df)
                 Count
Pclass Survived       
1      0            80
1      1           136
2      0            97
2      1            87
3      0           372
3      1           119

Upvotes: 5

Related Questions