Reputation: 357

Plotting stacked bar

The left side of the figure represents the dataset. The right side of the figure represents the sketched bar that I want to produce. how can I plot these data to produce such sketched bar where X represents age groups and Y represents Length groups?

Any help would be great. Thanks in advance.

Upvotes: 1

Answers (3)

Venkatachalam

Reputation: 16966

You could use pd.cut to convert the continuous variable into categorical values. Then, calling pd.crosstab would get you the required information.

import numpy as np
import pandas as pd

np.random.seed(42)
age = np.random.randint(40, 90, size=(200,))
length = np.random.randint(0, 30, size=(200,))

age = pd.cut(age, bins=[0, 50, 60, 70, 80, np.inf],labels =['<50','50-59','60-69','70-79','80+'])
length = pd.cut(length, bins=[0, 5, 20, 100], labels =['<=5 days', '6-20 days','20+ days'])


df = pd.crosstab(age, length).apply(lambda row: 100*row/row.sum(),axis=1)
df.columns.name = 'Length'
df.index.name = 'Age'

ax = df.plot(kind='bar', stacked=True, legend='reverse', figsize=(12, 8))
ax.tick_params(axis='x', rotation=0, labelsize=20)

ax.set_xlabel('Age', fontsize='xx-large') 

ax.legend(loc='center left', bbox_to_anchor=(-0.4, 0.5), fontsize=20)

ax.set_ylim(0,100) 
ax.set_yticklabels([f'{int(i)}%' for i in ax.get_yticks()],  fontsize=20)
ax.spines['right'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.set_axisbelow(True)
ax.yaxis.grid(color='gray')

Upvotes: 1

Trenton McKinney

Reputation: 62523

Use `pd.cut()` and `pandas.DataFrame.groupby()`

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# data
random.seed(365)
data = {'age': [np.random.randint(90) for _ in range(100)],
        'length': [np.random.randint(30) for _ in range(100)]}

# dataframe
df = pd.DataFrame(data)

# age and days groups
df['age_group'] = pd.cut(df['age'], bins=[0, 50, 60, 70, 80, 1000], labels=['<50', '50-59', '60-69', '70-79', '≥80'])
df['days'] = pd.cut(df['length'], bins=[0, 6, 20, 1000], labels=['≤5 days', '6-20 days', '≥20 days'])

 age  length age_group       days
  72      22     70-79   ≥20 days
   2      14       <50  6-20 days
  14      12       <50  6-20 days
  47       4       <50    ≤5 days
  18      12       <50  6-20 days

# groupby plot
plt.figure(figsize=(16, 10))
df.groupby(['age_group', 'days'])['days'].count().unstack().apply(lambda x: x*100/sum(x), axis=1).plot.bar(stacked=True)
plt.legend(loc='center right', bbox_to_anchor=(-0.05, 0.5))
plt.xlabel('Age Groups')
plt.xticks(rotation=0)
plt.gca().set_yticklabels(['{:.0f}%'.format(x) for x in plt.gca().get_yticks()]) 
plt.show()

This solution is similar to that provided by Quang Hoang, except it provides a y-axis from 0% - 100% while the other is normalized from 0 - 1.
The other solution could be used and then just format the y-axis with:
- plt.gca().set_yticklabels(['{:.0f}%'.format(x*100) for x in plt.gca().get_yticks()])

Upvotes: 2

Quang Hoang

Reputation: 150805

You can cut each column into respective bins and do a value count:

age_bins = pd.cut(df['age'], bins=[0,50,60,70,80,1000],
                  labels=['<50','50-59','60-69','70-79','80+'])
length_bins = pd.cut(df['Length'], bins=[0,5,21, np.inf],
                    labels=['<=5 days', '6-20 days', '21+ days'])

(length_bins.groupby(age_bins)
     .value_counts(normalize=True)
     .unstack()
     .plot.bar(stacked=True)
)

you would get something like this:

Upvotes: 2

Plotting stacked bar

Answers (3)

Use pd.cut() and pandas.DataFrame.groupby()

Related Questions

Use `pd.cut()` and `pandas.DataFrame.groupby()`