Getting percentage for each column after groupby

Question

I have a pandas dataframe with two columns A and B. The column B contains three categories X, Y, 'Z'. I need to check the how much percentage is a particular value for each group in A. Here is how the dataframe looks like:

  A   B
  AA  X 
  BB  Y
  CC  Z
  AA  Y
  AA  Y
  BB  Z 
  ..  ..

Now I want to plot a stacked plot but it should be a percentage based stacked plot and not just count based for each category in B corresponding to a group in A. Here is what I did so far:

df.groupby(['A'])['B'].value_counts().unstack() which gives me this

B   X    Y      Z
A           
AA  65   666    5
BB  123  475    6
CC  267  1337   40

Now I want to divide each column by the sum of it's corresponding row like for first row (65/(65+666+5), 666/(65+666+5), 5/(65+666+5),)and plot the results as stacked bar plot. Can someone please help?

Sven Harris · Accepted Answer

You can find the row-wise sum and divide along the axis something like this:

freq_df = df.groupby(['A'])['B'].value_counts().unstack()
pct_df = freq_df.divide(freq_df.sum(axis=1), axis=0)

And then to plot that you should simply be able to use

pct_df.plot(kind="bar", stacked=True)

Getting percentage for each column after groupby

Answers (2)

Related Questions