Group By Median, Percentile and Percent of Total

Question

I have a dataframe that looks like this...

 ID Acuity TOTAL_ED_LOS
 1    2      423
 2    5      52
 3    5      535
 4    1      87
 ...

I would like to produce a table that looks like this:

 Acuity    Count   Median Percentile_25   Percentile_75   % of total
   1         234 ...                                         31%                                         
   2         65 ...                                           8%
   3         56 ...                                           7%
   4         345 ...                                          47%
   5         35  ...                                          5%

I already have code that will give me everything I need except for the % of total column

def percentile(n):
    def percentile_(x):
        return np.percentile(x, n)
    percentile_.__name__ = 'percentile_%s' % n
    return percentile_

df_grp = df_merged_v1.groupby(['Acuity'])
df_grp['TOTAL_ED_LOS'].agg(['count','median', 
                                  percentile(25), percentile(75)]).reset_index()

Is there an efficient way I can add the percent of total column? The link below contain code on how to obtain the percent of total but I'm unsure how to apply it to my code. I know that I could create two tables and then merge them but am curious if there is a cleaner way.

How to calculate count and percentage in groupby in Python

Scott Boston · Accepted Answer

Here's a one way to do it using some pandas builtin tools:

# Set random number seeed and create a dummy datafame with two columns
np.random.seed(123)
df = pd.DataFrame({'activity':np.random.choice([*'ABCDE'], 40), 
                   'TOTAL_ED_LDS':np.random.randint(50, 500, 40)})

# Reshape dataframe to get activit per column 
# then use the output from describe and transpose
df_out = df.set_index([df.groupby('activity').cumcount(),'activity'])['TOTAL_ED_LDS']\
           .unstack().describe().T

#Calculate percent count of total count
df_out['% of Total'] = df_out['count'] / df_out['count'].sum() * 100.
df_out

Output:

          count        mean         std    min     25%    50%     75%    max  % of Total
activity                                                                                
A           8.0  213.125000  106.810162   93.0  159.50  200.0  231.75  421.0        20.0
B          10.0  308.200000  116.105125   68.0  240.75  324.5  376.25  461.0        25.0
C           6.0  277.666667  117.188168  114.0  193.25  311.5  352.50  409.0        15.0
D           7.0  370.285714  124.724649  120.0  337.50  407.0  456.00  478.0        17.5
E           9.0  297.000000  160.812002   51.0  233.00  294.0  415.00  488.0        22.5

Group By Median, Percentile and Percent of Total

Answers (1)

Related Questions