Reputation: 2450
I have a data frame with, among other things, a user id and an age. I need to produce a bar chart of the number of users that fall with ranges of ages. What's throwing me is that there is really no upper bound for the age range. The specific ranges I'm trying to plot are age <= 25
, 25 < age <= 75
and age > 75
.
I'm relatively new to Pandas and plotting, and I'm sure this is a simple thing for more experienced data wranglers. Any assistance would be greatly appreciated.
Upvotes: 1
Views: 1650
Reputation: 13417
You'll need to use the pandas.cut
method to do this, and you can supply custom bins and labels!
from pandas import DataFrame, cut
from numpy.random import default_rng
from numpy import arange
from matplotlib.pyplot import show
# Make som dummy data
rng = default_rng(0)
df = DataFrame({'id': arange(100), 'age': rng.normal(50, scale=20, size=100).clip(min=0)})
print(df.head())
id age
0 0 52.514604
1 1 47.357903
2 2 62.808453
3 3 52.098002
4 4 39.286613
# Use pandas.cut to bin all of the ages & assign
# these bins to a new column to demonstrate how it works
## bins are [0-25), [25-75), [75-inf)
df['bin'] = cut(df['age'], [0, 25, 75, float('inf')], labels=['under 25', '25 up to 75', '75 or older'])
print(df.head())
id age bin
0 0 52.514604 25 up to 75
1 1 47.357903 25 up to 75
2 2 62.808453 25 up to 75
3 3 52.098002 25 up to 75
4 4 39.286613 25 up to 75
# Get the value_counts of those bins and plot!
df['bin'].value_counts().sort_index().plot.bar()
show()
Upvotes: 4