Reputation: 859
I have a dataframe that looks like this...
ID Acuity TOTAL_ED_LOS
1 2 423
2 5 52
3 5 535
4 1 87
...
I would like to produce a table that looks like this:
Acuity Count Median Percentile_25 Percentile_75 % of total
1 234 ... 31%
2 65 ... 8%
3 56 ... 7%
4 345 ... 47%
5 35 ... 5%
I already have code that will give me everything I need except for the % of total column
def percentile(n):
def percentile_(x):
return np.percentile(x, n)
percentile_.__name__ = 'percentile_%s' % n
return percentile_
df_grp = df_merged_v1.groupby(['Acuity'])
df_grp['TOTAL_ED_LOS'].agg(['count','median',
percentile(25), percentile(75)]).reset_index()
Is there an efficient way I can add the percent of total column? The link below contain code on how to obtain the percent of total but I'm unsure how to apply it to my code. I know that I could create two tables and then merge them but am curious if there is a cleaner way.
How to calculate count and percentage in groupby in Python
Upvotes: 0
Views: 140
Reputation: 153460
Here's a one way to do it using some pandas builtin tools:
# Set random number seeed and create a dummy datafame with two columns
np.random.seed(123)
df = pd.DataFrame({'activity':np.random.choice([*'ABCDE'], 40),
'TOTAL_ED_LDS':np.random.randint(50, 500, 40)})
# Reshape dataframe to get activit per column
# then use the output from describe and transpose
df_out = df.set_index([df.groupby('activity').cumcount(),'activity'])['TOTAL_ED_LDS']\
.unstack().describe().T
#Calculate percent count of total count
df_out['% of Total'] = df_out['count'] / df_out['count'].sum() * 100.
df_out
Output:
count mean std min 25% 50% 75% max % of Total
activity
A 8.0 213.125000 106.810162 93.0 159.50 200.0 231.75 421.0 20.0
B 10.0 308.200000 116.105125 68.0 240.75 324.5 376.25 461.0 25.0
C 6.0 277.666667 117.188168 114.0 193.25 311.5 352.50 409.0 15.0
D 7.0 370.285714 124.724649 120.0 337.50 407.0 456.00 478.0 17.5
E 9.0 297.000000 160.812002 51.0 233.00 294.0 415.00 488.0 22.5
Upvotes: 2