grouping, percentage, and barchart in Python

Question

I am very new to Python, and I am trying to plot a bar chart that shows the winner_rank_status percentage, and within each bar, I want to show the percentage of the winner (colour).

My dataset is like:

The code that I wrote:

Q3_df=games_df[['winner','winner_rank_status']]
Q3_df=Q3_df.groupby(['winner_rank_status','winner']).size().groupby(level=0).apply(lambda x: round(100*x/x.sum(),2))
Q3_df=Q3_df.unstack()
ax= Q3_df.plot(
    kind='bar',
    stacked=True,
    figsize=(14,7),
    rot=0,
    title='Effect of piece colour and winner rating status on the result',
    color=['black','grey','white'],
    edgecolor='black',
    
)
for c in ax.containers:
    ax.bar_label(c, label_type='center',color='b')

And it's the result that I get:

This result is wrong as it shows 100% for all categories!!! I need to show each category (Equal, Higher, Lower) their true percentage and then within each category the proportion of each colour...

Would you please guide me on how I can achieve it?

I appreciate your help.

JohanC · Accepted Answer

You can give a different color to the labels for each set of bars. To get the percentage where all 9 values sum to 100, you could divide by the total number games:

from matplotlib import pyplot as plt
import pandas as pd
import numpy as np

winner_options = ['black', 'draw', 'white']
rank_options = ['lower', 'equal', 'higher']
Q3_df = pd.DataFrame({'winner_rank_status': pd.Categorical(np.random.choice(rank_options, 1000, p=[.46, .07, .47]), rank_options),
                      'winner': pd.Categorical(np.random.choice(winner_options, 1000, p=[.51, .03, .46]), winner_options)})
Q3_rank_winner_df = Q3_df.groupby(['winner_rank_status', 'winner']).size().groupby(level=0).apply(
    lambda x: np.round(100 * x / len(Q3_df), 2))
Q3_rank_winner_df = Q3_rank_winner_df.unstack()
ax = Q3_rank_winner_df.plot(
    kind='bar',
    stacked=True,
    figsize=(14, 7),
    rot=0,
    title='Effect of piece colour and winner rating status on the result',
    color=['black', 'grey', 'white'],
    edgecolor='black')
for bars, color in zip(ax.containers, ['skyblue', 'navy', 'darkblue']):
    ax.bar_label(bars, label_type='center', color=color)
ax.legend(bbox_to_anchor=[1.01, 1.02], loc='upper left')
plt.tight_layout()
plt.show()

The new requirements are a bit confusing. One might add the percentages of each rank at the top of the bars:


from matplotlib import pyplot as plt
import pandas as pd
import numpy as np

winner_options = ['black', 'draw', 'white']
rank_options = ['lower', 'equal', 'higher']
Q3_df = pd.DataFrame(
    {'winner_rank_status': pd.Categorical(np.random.choice(rank_options, 1000, p=[.65, .05, .30]), rank_options),
     'winner': pd.Categorical(np.random.choice(winner_options, 1000, p=[.46, .07, .47]), winner_options)})
Q3_rank_winner_df = Q3_df.groupby(['winner_rank_status', 'winner']).size().groupby(level=0).apply(
    lambda x: np.round(100 * x / x.sum(), 2))
Q3_rank_winner_df = Q3_rank_winner_df.unstack()
ax = Q3_rank_winner_df.plot(
    kind='bar',
    stacked=True,
    figsize=(14, 7),
    rot=0,
    title='Effect of piece colour and winner rating status on the result',
    color=['black', 'grey', 'white'],
    edgecolor='black')
for bars, color in zip(ax.containers, ['skyblue', 'navy', 'darkblue']):
    ax.bar_label(bars, label_type='center', color=color)

Q3_rank_df = Q3_df.groupby(['winner_rank_status']).size() * 100 / len(Q3_df)
for row, percent in enumerate(Q3_rank_df):
    ax.text(row, 103, f'{percent:.02f} %', color='navy', ha='center', va='center')
ax.margins(y=0.08)  # more space on top

ax.legend(bbox_to_anchor=[1.01, 1.02], loc='upper left')
plt.tight_layout()
plt.show()

grouping, percentage, and barchart in Python

Answers (1)

Related Questions