Reputation: 19339
With:
import pandas as pd
df = pd.DataFrame({'a':[1,2,3,4,5,12,14,121,131,298,299,1001]})
print df.a.mean()
returns an average of all the numbers:
157.583333333
Half of the numbers are smaller than 100. I wonder if there is a way to break the numbers into the categories (essentially classifying them). I would specify the number of groups to classify the numbers into and the function would return a list where each number is replaced with the corresponding category's index. So the numbers smaller then 100 would be given an integer category 1. Then the numbers from 100 - 200 would be given a category 2 and etc. Essentially some kind of rounding function that would round the numbers to that all into the range of values: from 0 to 100, from 100.1 to 200.0 and etc
Upvotes: 1
Views: 2007
Reputation:
import pandas as pd
df = pd.DataFrame({'a':[1,2,3,4,5,12,14,121,131,298,299,1001]})
df['category'] = df['a'] // 100 + 1
print(df[['a', 'category']])
a category
0 1 1
1 2 1
2 3 1
3 4 1
4 5 1
5 12 1
6 14 1
7 121 2
8 131 2
9 298 3
10 299 3
11 1001 11
Upvotes: 4
Reputation: 32105
Use pd.cut
. the bins=
argument allows you to define the number of categories to get. The result is a series with bin ranges:
pd.cut(df.a, bins=10)
Out[156]:
0 (0, 101]
1 (0, 101]
2 (0, 101]
3 (0, 101]
4 (0, 101]
5 (0, 101]
6 (0, 101]
7 (101, 201]
8 (101, 201]
9 (201, 301]
10 (201, 301]
11 (901, 1001]
Name: a, dtype: category
Categories (10, object): [(0, 101] < (101, 201] < (201, 301] < (301, 401] ... (601, 701] < (701, 801] < (801, 901] < (901, 1001]]
Upvotes: 3