Reputation: 6223
I've got an array of (random) floating point numbers. I want to round each value up to a limit of an arbitrary grid. See the following example:
import numpy as np
np.random.seed(1)
# Setup
sample = np.random.normal(loc=20, scale=6, size=10)
intervals = [-np.inf, 10, 12, 15, 18, 21, 25, 30, np.inf]
# Round each interval up
for i in range(len(intervals) - 1):
sample[np.logical_and(sample > intervals[i], sample <= intervals[i+1])] = intervals[i+1]
This results in:
[ 30. 18. 18. 15. 30. 10. inf 18. 25. 21.]
How can I avoid the for
loop? I'm sure there's some way using NumPy's array magic that I don't see right now.
Upvotes: 10
Views: 2088
Reputation: 43524
Another option is:
np.array(intervals)[(sample[:,None] > intervals).sum(axis=1)]
#array([30., 18., 18., 15., 30., 10., inf, 18., 25., 21.])
Essentially we build a mask that checks to see if the sample is greater than the interval (assumes it's already sorted as in your example). Then we sum along the first axis, which will add up a 1 for every interval that the value is greater than.
The resultant sums are the indices in the intervals
array.
A non-NumPy solution using a list comprehension (obviously includes the for
loop, but should be relatively efficient with the generator):
new_sample = [next(i for i in intervals if i>s) for s in sample]
#[30, 18, 18, 15, 30, 10, inf, 18, 25, 21]
Upvotes: 1
Reputation: 215047
If intervals
is sorted, you can use np.searchsorted
:
np.array(intervals)[np.searchsorted(intervals, sample)]
# array([ 30., 18., 18., 15., 30., 10., inf, 18., 25., 21.])
searchsorted
returns the index of the interval where the element belongs to:
np.searchsorted(intervals, sample)
# array([7, 4, 4, 3, 7, 1, 8, 4, 6, 5])
The default side='left'
returns the smallest index of such interval and the result falls into the left open, right close scenario.
Upvotes: 9
Reputation: 29099
If values
is a 1D arrays with your values, you could do something like
diff = values < intervals[:, None]
t = np.argmax(diff, axis=0)
new_values = intervals[t]
Upvotes: 0
Reputation: 21274
You can use Pandas cut()
:
import pandas as pd
pd.cut(sample, intervals, labels=intervals[1:]).tolist()
Upvotes: 4
Reputation: 8277
Did not run a check but:
from bisect import bisect
for index, value in enumerate(sample):
sample[index] = intervals[ bisect( intervals, value)]
Upvotes: 0