Reputation: 2285
I have a pandas dataframe with two columns A
and B
. The column B
contains three categories X
, Y
, 'Z'. I need to check the how much percentage is a particular value for each group in A. Here is how the dataframe looks like:
A B
AA X
BB Y
CC Z
AA Y
AA Y
BB Z
.. ..
Now I want to plot a stacked plot but it should be a percentage based stacked plot and not just count based for each category in B
corresponding to a group in A
. Here is what I did so far:
df.groupby(['A'])['B'].value_counts().unstack()
which gives me this
B X Y Z
A
AA 65 666 5
BB 123 475 6
CC 267 1337 40
Now I want to divide each column by the sum of it's corresponding row like for first row (65/(65+666+5), 666/(65+666+5), 5/(65+666+5),)
and plot the results as stacked bar plot.
Can someone please help?
Upvotes: 2
Views: 3063
Reputation: 17
I have found a solution that works well.
Step 1. Apply groupby
Step 2. Use value_count
Step 3. Use unstack
Step 4. Multiply with 100 and apply a round function.
Step 5. Apply a fillna method to fill 0 for empty values
Here is the code
df.groupby(['A'])['B']\
.value_counts(normalize=True)\
.unstack()\
.mul(100)\
.round(2)\
.fillna(0)
Or you can alternatively use the crosstab method
Step 1. Use crosstab
Step 2. Apply a lambda function.
pd.crosstab(df["A"], df["B"])\
.apply(lambda x: round(x / x.sum() * 100, 2), axis = 1)
Upvotes: 0
Reputation: 2939
You can find the row-wise sum and divide along the axis something like this:
freq_df = df.groupby(['A'])['B'].value_counts().unstack()
pct_df = freq_df.divide(freq_df.sum(axis=1), axis=0)
And then to plot that you should simply be able to use
pct_df.plot(kind="bar", stacked=True)
Upvotes: 4