Shamsul Masum
Shamsul Masum

Reputation: 357

Plotting stacked bar

enter image description here

The left side of the figure represents the dataset. The right side of the figure represents the sketched bar that I want to produce. how can I plot these data to produce such sketched bar where X represents age groups and Y represents Length groups?

Any help would be great. Thanks in advance.

Upvotes: 1

Views: 230

Answers (3)

Venkatachalam
Venkatachalam

Reputation: 16966

You could use pd.cut to convert the continuous variable into categorical values. Then, calling pd.crosstab would get you the required information.

import numpy as np
import pandas as pd

np.random.seed(42)
age = np.random.randint(40, 90, size=(200,))
length = np.random.randint(0, 30, size=(200,))

age = pd.cut(age, bins=[0, 50, 60, 70, 80, np.inf],labels =['<50','50-59','60-69','70-79','80+'])
length = pd.cut(length, bins=[0, 5, 20, 100], labels =['<=5 days', '6-20 days','20+ days'])


df = pd.crosstab(age, length).apply(lambda row: 100*row/row.sum(),axis=1)
df.columns.name = 'Length'
df.index.name = 'Age'

ax = df.plot(kind='bar', stacked=True, legend='reverse', figsize=(12, 8))
ax.tick_params(axis='x', rotation=0, labelsize=20)

ax.set_xlabel('Age', fontsize='xx-large') 

ax.legend(loc='center left', bbox_to_anchor=(-0.4, 0.5), fontsize=20)

ax.set_ylim(0,100) 
ax.set_yticklabels([f'{int(i)}%' for i in ax.get_yticks()],  fontsize=20)
ax.spines['right'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.set_axisbelow(True)
ax.yaxis.grid(color='gray')

enter image description here

Upvotes: 1

Trenton McKinney
Trenton McKinney

Reputation: 62523

Use pd.cut() and pandas.DataFrame.groupby()

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# data
random.seed(365)
data = {'age': [np.random.randint(90) for _ in range(100)],
        'length': [np.random.randint(30) for _ in range(100)]}

# dataframe
df = pd.DataFrame(data)

# age and days groups
df['age_group'] = pd.cut(df['age'], bins=[0, 50, 60, 70, 80, 1000], labels=['<50', '50-59', '60-69', '70-79', '≥80'])
df['days'] = pd.cut(df['length'], bins=[0, 6, 20, 1000], labels=['≤5 days', '6-20 days', '≥20 days'])

 age  length age_group       days
  72      22     70-79   ≥20 days
   2      14       <50  6-20 days
  14      12       <50  6-20 days
  47       4       <50    ≤5 days
  18      12       <50  6-20 days

# groupby plot
plt.figure(figsize=(16, 10))
df.groupby(['age_group', 'days'])['days'].count().unstack().apply(lambda x: x*100/sum(x), axis=1).plot.bar(stacked=True)
plt.legend(loc='center right', bbox_to_anchor=(-0.05, 0.5))
plt.xlabel('Age Groups')
plt.xticks(rotation=0)
plt.gca().set_yticklabels(['{:.0f}%'.format(x) for x in plt.gca().get_yticks()]) 
plt.show()

enter image description here

  • This solution is similar to that provided by Quang Hoang, except it provides a y-axis from 0% - 100% while the other is normalized from 0 - 1.
  • The other solution could be used and then just format the y-axis with:
    • plt.gca().set_yticklabels(['{:.0f}%'.format(x*100) for x in plt.gca().get_yticks()])

Upvotes: 2

Quang Hoang
Quang Hoang

Reputation: 150805

You can cut each column into respective bins and do a value count:

age_bins = pd.cut(df['age'], bins=[0,50,60,70,80,1000],
                  labels=['<50','50-59','60-69','70-79','80+'])
length_bins = pd.cut(df['Length'], bins=[0,5,21, np.inf],
                    labels=['<=5 days', '6-20 days', '21+ days'])

(length_bins.groupby(age_bins)
     .value_counts(normalize=True)
     .unstack()
     .plot.bar(stacked=True)
)

you would get something like this:

enter image description here

Upvotes: 2

Related Questions