alphanumeric
alphanumeric

Reputation: 19339

How to classify the numbers by value in DataFrame

With:

import pandas as pd     
df = pd.DataFrame({'a':[1,2,3,4,5,12,14,121,131,298,299,1001]})
print df.a.mean()

returns an average of all the numbers:

157.583333333

Half of the numbers are smaller than 100. I wonder if there is a way to break the numbers into the categories (essentially classifying them). I would specify the number of groups to classify the numbers into and the function would return a list where each number is replaced with the corresponding category's index. So the numbers smaller then 100 would be given an integer category 1. Then the numbers from 100 - 200 would be given a category 2 and etc. Essentially some kind of rounding function that would round the numbers to that all into the range of values: from 0 to 100, from 100.1 to 200.0 and etc

Upvotes: 1

Views: 2007

Answers (2)

user707650
user707650

Reputation:

import pandas as pd     
df = pd.DataFrame({'a':[1,2,3,4,5,12,14,121,131,298,299,1001]})
df['category'] = df['a'] // 100 + 1
print(df[['a', 'category']])

       a  category
0      1         1
1      2         1
2      3         1
3      4         1
4      5         1
5     12         1
6     14         1
7    121         2
8    131         2
9    298         3
10   299         3
11  1001        11

Upvotes: 4

Zeugma
Zeugma

Reputation: 32105

Use pd.cut. the bins= argument allows you to define the number of categories to get. The result is a series with bin ranges:

pd.cut(df.a, bins=10)
Out[156]: 
0        (0, 101]
1        (0, 101]
2        (0, 101]
3        (0, 101]
4        (0, 101]
5        (0, 101]
6        (0, 101]
7      (101, 201]
8      (101, 201]
9      (201, 301]
10     (201, 301]
11    (901, 1001]
Name: a, dtype: category
Categories (10, object): [(0, 101] < (101, 201] < (201, 301] < (301, 401] ... (601, 701] < (701, 801] < (801, 901] < (901, 1001]]

Upvotes: 3

Related Questions