Reputation: 2724
I have a Pandas
DataFrame
named df
that contains n
columns
. One of the columns
is named COUNT
, which shows how many times values in A
occurs. A
contains unique identifiers so every row has the value 1
in the column
COUNT
. It looks like this:
A B C D E COUNT
id1 cat1 1 a 15 1
id2 cat2 2 b 14 1
id3 cat2 2 c 14 1
id4 cat1 1 d 15 1
id5 cat3 2 e 14 1
.....
Now I want to transform my df
to look like this:
14 15
cat1_tot NaN 2
cat1_share NaN 1
cat2_tot 2 NaN
cat2_share 0.6666 NaN
cat3_tot 1 NaN
cat3_share 0.3333 NaN
All 3 2
I can get catx_tot
by using pd.pivot_table
pd.pivot_table(
df,
values='COUNT',
index=['B'],
columns=['E'],
margins=True,
aggfunc=np.sum
)
But how do I add share to this?
Upvotes: 3
Views: 1665
Reputation: 294488
combine groupby.size
with groupby.transform
size = df.groupby(['B', 'E']).size()
sums = size.groupby(level='E').transform(np.sum)
aggd = pd.concat([size, size / sums], axis=1, keys=['total', 'share'])
aggd.unstack().stack(0)
to get the All
row
all_ = aggd.groupby(level='E').sum().total.rename(('All', 'total'))
aggd.unstack().stack(0).append(all_)
Upvotes: 1