Reputation: 97
I have a data frame like this:
Code Group Name Number
ABC Group_1_ABC Mike 40
Amber 60
Group_2_ABC Rachel 90
XYZ Group_1_XYZ Bob 30
Peter 75
Nikki 55
Group_2_XYZ Julia 23
Ross 80
LMN Group_1_LMN Paul 95
. . . .
. . . .
I have created this data frame by grouping by code, group, name and summing the number.
Now i want to calculate the percentage of each name for a particular code. For that i want to sum all the numbers that are part of one code. I was doing this to calculate the percentage.
df['Percentage']= (df['Number']/df['??'])*100
Now for the total sum part for each group, I can`t figure out how to calculate it? I want the total sum for each code category, in order to calculate the percentage.
So for example for Code: ABC the total should be 40+60+90=190. This 190 would than be divided with all the number for each user in ABC to calculate their percentage for their respective code category. So technically the column group and name don`t have any role in calculating the total sum for each code category.
Upvotes: 0
Views: 477
Reputation: 863741
Use GroupBy.transform
by first level or by level name Code
:
df['Percentage']= (df['Number']/df.groupby(level=0)['Number'].transform('sum'))*100
df['Percentage']= (df['Number']/df.groupby(level=['Code'])['Number'].transform('sum'))*100
Or in last pandas versions is not necessary specified level parameter:
df['Percentage']= (df['Number']/df.groupby('Code')['Number'].transform('sum'))*100
print (df)
Number Percentage
Code Group Name
ABC Group_1_ABC Mike 40 21.052632
Amber 60 31.578947
Group_2_ABC Rachel 90 47.368421
XYZ Group_1_XYZ Bob 30 11.406844
Peter 75 28.517110
Nikki 55 20.912548
Group_2_XYZ Julia 23 8.745247
Ross 80 30.418251
LMN Group_1_LMN Paul 95 100.000000
Detail:
print (df.groupby(level=0)['Number'].transform('sum'))
Code Group Name
ABC Group_1_ABC Mike 190
Amber 190
Group_2_ABC Rachel 190
XYZ Group_1_XYZ Bob 263
Peter 263
Nikki 263
Group_2_XYZ Julia 263
Ross 263
LMN Group_1_LMN Paul 95
Name: Number, dtype: int64
Upvotes: 1