Reputation: 2615
my dataframe:
df = pd.DataFrame({'label':[0,0,1,1,2,2],"gender":['M','F','M','F','M','F'],'count':[100,200,150,210,300,220]})
I tried:
df['percent'] = 100* df['count'] / df.groupby('label')['count'].sum()
But it gives only for top 4 rows still wrong values.
I need a column with the name percent
percentage of count based on label
Output:
label gender count percent
0 0 M 100 33
1 0 F 200 67
2 1 M 150 41
3 1 F 210 59
4 2 M 300 57
5 2 F 220 43
Upvotes: 1
Views: 2471
Reputation: 20669
One more option is to use set_index
/reset_index
trick.
d = df.set_index(['label', 'gender'])
d.mul(100).div(d.sum(level=0)).reset_index()
label gender count
0 0 M 33.333333
1 0 F 66.666667
2 1 M 41.666667
3 1 F 58.333333
4 2 M 57.692308
5 2 F 42.307692
For string presentation, you can use python's mini string language
here.
d = df.set_index(['label', 'gender'])
d.div(d.sum(level=0))['count'].map("{:%}".format).reset_index()
label gender count
0 0 M 33.333333%
1 0 F 66.666667%
2 1 M 41.666667%
3 1 F 58.333333%
4 2 M 57.692308%
5 2 F 42.307692%
Upvotes: 1
Reputation: 28699
Another option, though more verbose than transform
, is to use map
:
mapping = df.label.map(df.groupby("label")["count"].sum())
df.assign(percent=df["count"].mul(100).div(mapping))
label gender count percent
0 0 M 100 33.333333
1 0 F 200 66.666667
2 1 M 150 41.666667
3 1 F 210 58.333333
4 2 M 300 57.692308
5 2 F 220 42.307692
Upvotes: 2
Reputation: 862791
Use GroupBy.transform
for repeat aggregate values for Series with same size like original column, so possible divide:
df['percent'] = 100* df['count'] / df.groupby('label')['count'].transform('sum')
print (df)
label gender count percent
0 0 M 100 33.333333
1 0 F 200 66.666667
2 1 M 150 41.666667
3 1 F 210 58.333333
4 2 M 300 57.692308
5 2 F 220 42.307692
Detail:
print (df.groupby('label')['count'].transform('sum'))
0 300
1 300
2 360
3 360
4 520
5 520
Name: count, dtype: int64
Upvotes: 4