jkb
jkb

Reputation: 2450

How can I create a bar chart with ranges of values

I have a data frame with, among other things, a user id and an age. I need to produce a bar chart of the number of users that fall with ranges of ages. What's throwing me is that there is really no upper bound for the age range. The specific ranges I'm trying to plot are age <= 25, 25 < age <= 75 and age > 75.

I'm relatively new to Pandas and plotting, and I'm sure this is a simple thing for more experienced data wranglers. Any assistance would be greatly appreciated.

Upvotes: 1

Views: 1650

Answers (1)

Cameron Riddell
Cameron Riddell

Reputation: 13417

You'll need to use the pandas.cut method to do this, and you can supply custom bins and labels!

from pandas import DataFrame, cut
from numpy.random import default_rng
from numpy import arange
from matplotlib.pyplot import show

# Make som dummy data
rng = default_rng(0)
df = DataFrame({'id': arange(100), 'age': rng.normal(50, scale=20, size=100).clip(min=0)})

print(df.head())
   id        age
0   0  52.514604
1   1  47.357903
2   2  62.808453
3   3  52.098002
4   4  39.286613

# Use pandas.cut to bin all of the ages & assign 
#   these bins to a new column to demonstrate how it works
## bins are [0-25), [25-75), [75-inf)
df['bin'] = cut(df['age'], [0, 25, 75, float('inf')], labels=['under 25', '25 up to 75', '75 or older'])
print(df.head())
   id        age          bin
0   0  52.514604  25 up to 75
1   1  47.357903  25 up to 75
2   2  62.808453  25 up to 75
3   3  52.098002  25 up to 75
4   4  39.286613  25 up to 75

# Get the value_counts of those bins and plot!
df['bin'].value_counts().sort_index().plot.bar()
show()

enter image description here

Upvotes: 4

Related Questions