Reputation: 357
The left side of the figure represents the dataset. The right side of the figure represents the sketched bar that I want to produce. how can I plot these data to produce such sketched bar where X represents age groups and Y represents Length groups?
Any help would be great. Thanks in advance.
Upvotes: 1
Views: 230
Reputation: 16966
You could use pd.cut
to convert the continuous variable into categorical values.
Then, calling pd.crosstab
would get you the required information.
import numpy as np
import pandas as pd
np.random.seed(42)
age = np.random.randint(40, 90, size=(200,))
length = np.random.randint(0, 30, size=(200,))
age = pd.cut(age, bins=[0, 50, 60, 70, 80, np.inf],labels =['<50','50-59','60-69','70-79','80+'])
length = pd.cut(length, bins=[0, 5, 20, 100], labels =['<=5 days', '6-20 days','20+ days'])
df = pd.crosstab(age, length).apply(lambda row: 100*row/row.sum(),axis=1)
df.columns.name = 'Length'
df.index.name = 'Age'
ax = df.plot(kind='bar', stacked=True, legend='reverse', figsize=(12, 8))
ax.tick_params(axis='x', rotation=0, labelsize=20)
ax.set_xlabel('Age', fontsize='xx-large')
ax.legend(loc='center left', bbox_to_anchor=(-0.4, 0.5), fontsize=20)
ax.set_ylim(0,100)
ax.set_yticklabels([f'{int(i)}%' for i in ax.get_yticks()], fontsize=20)
ax.spines['right'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.set_axisbelow(True)
ax.yaxis.grid(color='gray')
Upvotes: 1
Reputation: 62523
pd.cut()
and pandas.DataFrame.groupby()
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# data
random.seed(365)
data = {'age': [np.random.randint(90) for _ in range(100)],
'length': [np.random.randint(30) for _ in range(100)]}
# dataframe
df = pd.DataFrame(data)
# age and days groups
df['age_group'] = pd.cut(df['age'], bins=[0, 50, 60, 70, 80, 1000], labels=['<50', '50-59', '60-69', '70-79', '≥80'])
df['days'] = pd.cut(df['length'], bins=[0, 6, 20, 1000], labels=['≤5 days', '6-20 days', '≥20 days'])
age length age_group days
72 22 70-79 ≥20 days
2 14 <50 6-20 days
14 12 <50 6-20 days
47 4 <50 ≤5 days
18 12 <50 6-20 days
# groupby plot
plt.figure(figsize=(16, 10))
df.groupby(['age_group', 'days'])['days'].count().unstack().apply(lambda x: x*100/sum(x), axis=1).plot.bar(stacked=True)
plt.legend(loc='center right', bbox_to_anchor=(-0.05, 0.5))
plt.xlabel('Age Groups')
plt.xticks(rotation=0)
plt.gca().set_yticklabels(['{:.0f}%'.format(x) for x in plt.gca().get_yticks()])
plt.show()
y-axis
from 0% - 100% while the other is normalized from 0 - 1.plt.gca().set_yticklabels(['{:.0f}%'.format(x*100) for x in plt.gca().get_yticks()])
Upvotes: 2
Reputation: 150805
You can cut each column into respective bins and do a value count:
age_bins = pd.cut(df['age'], bins=[0,50,60,70,80,1000],
labels=['<50','50-59','60-69','70-79','80+'])
length_bins = pd.cut(df['Length'], bins=[0,5,21, np.inf],
labels=['<=5 days', '6-20 days', '21+ days'])
(length_bins.groupby(age_bins)
.value_counts(normalize=True)
.unstack()
.plot.bar(stacked=True)
)
you would get something like this:
Upvotes: 2