Carlo Silanu
Carlo Silanu

Reputation: 79

How to find the number of value in range of integers in a column of a data frame in python pandas

So i have this big dataframe with alot of columns like age, name, sex, etc.

I want to make a new column with age group between 1-10, 11-20, 21-30,...,71-80

I tried to do

ranges = [1, 10, 20, 30, 40, 50, 60, 70, 80]
df.age.groupby(pd.cut(df.age, ranges)).count()

and the result is

age
(1, 10]      64
(10, 20]    162
(20, 30]    361
(30, 40]    210
(40, 50]    132
(50, 60]     62
(60, 70]     27
(70, 80]      6
Name: age, dtype: int64

which is exactly what i wanted but the groups are incorrect. i want it to be 1-10 and then 11-20 not 1-10 and 10-20. Can anybody help me solve this problem?

Upvotes: 1

Views: 322

Answers (1)

jezrael
jezrael

Reputation: 862611

I think first is necessary explain by comment of @samthegolden:

(10, 20] means "between 10 and 20, excluding 10 and including 20" due to the parenthesis format.

But you can do it by labels parameter created by ranges with zip in list comprehension:

np.random.seed(2020)
df = pd.DataFrame({'age':np.random.randint(1, 80, size=100)})

ranges = [1, 10, 20, 30, 40, 50, 60, 70, 80]
labels = ['{}-{}'.format(i + 1, j) for i, j in zip(ranges[:-1], ranges[1:])] 
labels[0] = '{}-{}'.format(ranges[0], ranges[1])
print (labels)
['1-10', '11-20', '21-30', '31-40', '41-50', '51-60', '61-70', '71-80']

ranges = [1, 10, 20, 30, 40, 50, 60, 70, 80]
s = df.age.groupby(pd.cut(df.age, ranges, labels=labels)).count()
print (s)

age
1-10     14
11-20    10
21-30    15
31-40    12
41-50     7
51-60    11
61-70    18
71-80    12
Name: age, dtype: int64

Upvotes: 1

Related Questions