How to sum multiple values in a dataframe column, if they are corresponding to 1 value in an other column

Question

I have a data frame like this:

Code  Group        Name   Number
ABC   Group_1_ABC  Mike   40
                   Amber  60
      Group_2_ABC  Rachel 90
XYZ   Group_1_XYZ  Bob    30
                   Peter  75
                   Nikki  55
      Group_2_XYZ  Julia  23
                   Ross   80
LMN   Group_1_LMN  Paul   95
.     .            .      .
.     .            .      .

I have created this data frame by grouping by code, group, name and summing the number.

Now i want to calculate the percentage of each name for a particular code. For that i want to sum all the numbers that are part of one code. I was doing this to calculate the percentage.

df['Percentage']= (df['Number']/df['??'])*100

Now for the total sum part for each group, I can`t figure out how to calculate it? I want the total sum for each code category, in order to calculate the percentage.

So for example for Code: ABC the total should be 40+60+90=190. This 190 would than be divided with all the number for each user in ABC to calculate their percentage for their respective code category. So technically the column group and name don`t have any role in calculating the total sum for each code category.

jezrael · Accepted Answer

Use GroupBy.transform by first level or by level name Code:

df['Percentage']= (df['Number']/df.groupby(level=0)['Number'].transform('sum'))*100

df['Percentage']= (df['Number']/df.groupby(level=['Code'])['Number'].transform('sum'))*100

Or in last pandas versions is not necessary specified level parameter:

df['Percentage']= (df['Number']/df.groupby('Code')['Number'].transform('sum'))*100

print (df)
                         Number  Percentage
Code Group       Name                      
ABC  Group_1_ABC Mike        40   21.052632
                 Amber       60   31.578947
     Group_2_ABC Rachel      90   47.368421
XYZ  Group_1_XYZ Bob         30   11.406844
                 Peter       75   28.517110
                 Nikki       55   20.912548
     Group_2_XYZ Julia       23    8.745247
                 Ross        80   30.418251
LMN  Group_1_LMN Paul        95  100.000000

Detail:

print (df.groupby(level=0)['Number'].transform('sum'))
Code  Group        Name  
ABC   Group_1_ABC  Mike      190
                   Amber     190
      Group_2_ABC  Rachel    190
XYZ   Group_1_XYZ  Bob       263
                   Peter     263
                   Nikki     263
      Group_2_XYZ  Julia     263
                   Ross      263
LMN   Group_1_LMN  Paul       95
Name: Number, dtype: int64

How to sum multiple values in a dataframe column, if they are corresponding to 1 value in an other column

Answers (1)

Related Questions