Subbu VidyaSekar
Subbu VidyaSekar

Reputation: 2615

get the percentage of each class pandas dataframe

my dataframe:

df = pd.DataFrame({'label':[0,0,1,1,2,2],"gender":['M','F','M','F','M','F'],'count':[100,200,150,210,300,220]})

I tried:

df['percent'] = 100* df['count'] / df.groupby('label')['count'].sum()

But it gives only for top 4 rows still wrong values.

I need a column with the name percent percentage of count based on label

Output:

    label   gender  count percent
0   0       M       100   33
1   0       F       200   67
2   1       M       150   41
3   1       F       210   59
4   2       M       300   57
5   2       F       220   43

Upvotes: 1

Views: 2471

Answers (3)

Ch3steR
Ch3steR

Reputation: 20669

One more option is to use set_index/reset_index trick.

d = df.set_index(['label', 'gender'])
d.mul(100).div(d.sum(level=0)).reset_index()

   label gender      count
0      0      M  33.333333
1      0      F  66.666667
2      1      M  41.666667
3      1      F  58.333333
4      2      M  57.692308
5      2      F  42.307692

For string presentation, you can use python's mini string language here.

d = df.set_index(['label', 'gender'])
d.div(d.sum(level=0))['count'].map("{:%}".format).reset_index()

   label gender       count
0      0      M  33.333333%
1      0      F  66.666667%
2      1      M  41.666667%
3      1      F  58.333333%
4      2      M  57.692308%
5      2      F  42.307692%

Upvotes: 1

sammywemmy
sammywemmy

Reputation: 28699

Another option, though more verbose than transform, is to use map:

mapping = df.label.map(df.groupby("label")["count"].sum())
df.assign(percent=df["count"].mul(100).div(mapping))

    label   gender  count   percent
0   0   M   100     33.333333
1   0   F   200     66.666667
2   1   M   150     41.666667
3   1   F   210     58.333333
4   2   M   300     57.692308
5   2   F   220     42.307692

Upvotes: 2

jezrael
jezrael

Reputation: 862791

Use GroupBy.transform for repeat aggregate values for Series with same size like original column, so possible divide:

df['percent'] = 100* df['count'] / df.groupby('label')['count'].transform('sum')
print (df)
   label gender  count    percent
0      0      M    100  33.333333
1      0      F    200  66.666667
2      1      M    150  41.666667
3      1      F    210  58.333333
4      2      M    300  57.692308
5      2      F    220  42.307692

Detail:

print (df.groupby('label')['count'].transform('sum'))
0    300
1    300
2    360
3    360
4    520
5    520
Name: count, dtype: int64

Upvotes: 4

Related Questions