Reputation: 2232
I have following df:
Column 1
1
2435
3345
104
505
6005
10000
80000
100000
4000000
4440
520
...
This structure is not the best to plot a histogram, which is the main purpose. Bins don't really solve the problem either, at least from what I've tested so far. That's why I like to create my own bins in a new column:
I basically want to assign every value within a certain range in column 1 a bucket in column2, so that it look like this:
Column 1 Column2
1 < 10000
2435 < 10000
3345 < 10000
104 < 10000
505 < 10000
6005 < 10000
10000 < 50000
80000 < 150000
100000 < 150000
4000000 < 250000
4440 < 10000
520 < 10000
...
Once I get there, creating a plot will be much easier.
Thanks!
Upvotes: 1
Views: 57
Reputation: 393893
There is a pandas equivalent to this cut
there is a section describing this here. cut
returns the open closed intervals for each value:
In [29]:
df['bin'] = pd.cut(df['Column 1'], bins = [0,10000, 50000, 150000, 25000000])
df
Out[29]:
Column 1 bin
0 1 (0, 10000]
1 2435 (0, 10000]
2 3345 (0, 10000]
3 104 (0, 10000]
4 505 (0, 10000]
5 6005 (0, 10000]
6 10000 (0, 10000]
7 80000 (50000, 150000]
8 100000 (50000, 150000]
9 4000000 (150000, 25000000]
10 4440 (0, 10000]
11 520 (0, 10000]
The dtype of the column is a Category
and can be used for filtering, counting, plotting etc.
Upvotes: 2
Reputation: 76297
numpy.histogram
takes a bins
parameter which can be an integer array, and returns an array of the counts within those bins. So, if you run
import numpy as np
counts, _ = np.histogram(df[`Column 1`].values, [10000, 50000, 150000, 250000])
You will have the bins you want. From here, you can do whatever you want, including plotting the number of counts within each bin:
plot(counts)
Upvotes: 1