Pandas pivot_table to calculate share of margin

Question

I have a Pandas DataFrame named df that contains n columns. One of the columns is named COUNT, which shows how many times values in A occurs. A contains unique identifiers so every row has the value 1 in the column COUNT. It looks like this:

 A      B       C       D   E       COUNT
id1     cat1    1       a   15      1       
id2     cat2    2       b   14      1
id3     cat2    2       c   14      1
id4     cat1    1       d   15      1
id5     cat3    2       e   14      1
.....

Now I want to transform my df to look like this:

                14      15      
cat1_tot        NaN     2       
cat1_share      NaN     1       
cat2_tot        2       NaN     
cat2_share      0.6666  NaN     
cat3_tot        1       NaN     
cat3_share      0.3333  NaN
All             3       2

I can get catx_tot by using pd.pivot_table

pd.pivot_table(
     df, 
     values='COUNT', 
     index=['B'], 
     columns=['E'], 
     margins=True, 
     aggfunc=np.sum
)

But how do I add share to this?

piRSquared · Accepted Answer

combine groupby.size with groupby.transform

size = df.groupby(['B', 'E']).size()
sums = size.groupby(level='E').transform(np.sum)
aggd = pd.concat([size, size / sums], axis=1, keys=['total', 'share'])

aggd.unstack().stack(0)

to get the All row

all_ = aggd.groupby(level='E').sum().total.rename(('All', 'total'))
aggd.unstack().stack(0).append(all_)

Pandas pivot_table to calculate share of margin

Answers (1)

Related Questions