akram malek
akram malek

Reputation: 59

Percentage from Total size after GROUP BY python

   code_module  final_result

    AAA      Distinction            44
                  Fail              91
                 Pass              487
                Withdrawn          126

THIS IS AN OUTCOME OF PYTHON CODE

 studentInfo.groupby(['code_module','final_result']).agg({'code_module':[np.size]})

Upvotes: 1

Views: 68

Answers (1)

jezrael
jezrael

Reputation: 863166

I believe you need SeriesGroupBy.value_counts with parameter normalize:

s1 = studentInfo.groupby('code_module')['final_result'].value_counts(normalize=True)
print (s1)
code_module  final_result
AAA          Pass            0.651070
             Withdrawn       0.168449
             Fail            0.121658
             Distinction     0.058824
Name: final_result, dtype: float64

Or divide your simplify solution with DataFrameGroupBy.size by sum per first level of MultiIndex

s = studentInfo.groupby(['code_module','final_result']).size()
s2 = s.div(s.sum(level=0), level=0)
print (s2)
code_module  final_result
AAA          Distinction     0.058824
             Fail            0.121658
             Pass            0.651070
             Withdrawn       0.168449
dtype: float64

Difference between solutions is value_counts return output Series in descending order so that the first element is the most frequently-occurring element, size not.

Upvotes: 1

Related Questions