Reputation:
I am binning an array into a set of bins using np.digitize
:
data = np.array([1,5,6,15,25,60])
bins = np.array([ 5, 10, 20, 50])
result = np.digitize(data, bins)
# this fails
print bins[result]
I want the data to be placed into bins with the interpretation that each value in the bin is interpreted as "less than or equal to" except last bin, which all other values fit into. Is there a function that does this? in this case it would be: "x <= 5, 5 < x <= 10, 10 < x <= 20, and 20 < x <= 50 including x > 50". What's the concise way to do this in numpy?
Upvotes: 2
Views: 2725
Reputation: 1021
When you say 20 < x <= 50 including x > 50
for your last bin you are really saying x>20
. You can get x>20
by dropping your last bin of 50
. np.digitize takes a parameter right
which will when True
allow you to have bin behaviour like 10 < x <= 20
rather than the default 10 <= x < 20
>>> data = np.array([1,5,6,15,25,60])
>>> bins = np.array([ 5, 10, 20])
>>> np.digitize(data, bins, right=True)
array([0, 0, 1, 2, 3, 3])
>>>
your code bins[result]
fails because though bins
is defined with 3 values there are actually 4 intervals (x<=5, 5<x<=10, 10<x<=20, 20<x)
. So for example 65 will be placed in bin with index 3 ie. the 4th interval. The 4th value of bins
does not exist, hence your error.
Upvotes: 4