Classifying Data in a New Column

Question

I have following df:

This structure is not the best to plot a histogram, which is the main purpose. Bins don't really solve the problem either, at least from what I've tested so far. That's why I like to create my own bins in a new column:

I basically want to assign every value within a certain range in column 1 a bucket in column2, so that it look like this:

Column 1    Column2
1           < 10000
2435        < 10000
3345        < 10000  
104         < 10000
505         < 10000
6005        < 10000
10000       < 50000
80000       < 150000
100000      < 150000
4000000     < 250000
4440        < 10000
520         < 10000
...

Once I get there, creating a plot will be much easier.

Thanks!

EdChum · Accepted Answer

There is a pandas equivalent to this cut there is a section describing this here. cut returns the open closed intervals for each value:

In [29]:    
df['bin'] = pd.cut(df['Column 1'], bins = [0,10000, 50000, 150000, 25000000])
df

Out[29]:

    Column 1                 bin
0          1          (0, 10000]
1       2435          (0, 10000]
2       3345          (0, 10000]
3        104          (0, 10000]
4        505          (0, 10000]
5       6005          (0, 10000]
6      10000          (0, 10000]
7      80000     (50000, 150000]
8     100000     (50000, 150000]
9    4000000  (150000, 25000000]
10      4440          (0, 10000]
11       520          (0, 10000]

The dtype of the column is a Category and can be used for filtering, counting, plotting etc.

Classifying Data in a New Column

Answers (2)

Related Questions