enterML
enterML

Reputation: 2285

Getting percentage for each column after groupby

I have a pandas dataframe with two columns A and B. The column B contains three categories X, Y, 'Z'. I need to check the how much percentage is a particular value for each group in A. Here is how the dataframe looks like:

  A   B
  AA  X 
  BB  Y
  CC  Z
  AA  Y
  AA  Y
  BB  Z 
  ..  ..

Now I want to plot a stacked plot but it should be a percentage based stacked plot and not just count based for each category in B corresponding to a group in A. Here is what I did so far:

df.groupby(['A'])['B'].value_counts().unstack() which gives me this

B   X    Y      Z
A           
AA  65   666    5
BB  123  475    6
CC  267  1337   40

Now I want to divide each column by the sum of it's corresponding row like for first row (65/(65+666+5), 666/(65+666+5), 5/(65+666+5),)and plot the results as stacked bar plot. Can someone please help?

Upvotes: 2

Views: 3063

Answers (2)

Rahul Raoniar
Rahul Raoniar

Reputation: 17

I have found a solution that works well.

Step 1. Apply groupby

Step 2. Use value_count

Step 3. Use unstack

Step 4. Multiply with 100 and apply a round function.

Step 5. Apply a fillna method to fill 0 for empty values

Here is the code

df.groupby(['A'])['B']\
   .value_counts(normalize=True)\
   .unstack()\
   .mul(100)\
   .round(2)\
   .fillna(0)

Or you can alternatively use the crosstab method

Step 1. Use crosstab

Step 2. Apply a lambda function.

pd.crosstab(df["A"], df["B"])\
.apply(lambda x: round(x / x.sum() * 100, 2), axis = 1)

Upvotes: 0

Sven Harris
Sven Harris

Reputation: 2939

You can find the row-wise sum and divide along the axis something like this:

freq_df = df.groupby(['A'])['B'].value_counts().unstack()
pct_df = freq_df.divide(freq_df.sum(axis=1), axis=0)

And then to plot that you should simply be able to use

pct_df.plot(kind="bar", stacked=True)

Upvotes: 4

Related Questions