ascripter
ascripter

Reputation: 6223

round float values to interval limits / grid

I've got an array of (random) floating point numbers. I want to round each value up to a limit of an arbitrary grid. See the following example:

import numpy as np
np.random.seed(1)

# Setup
sample = np.random.normal(loc=20, scale=6, size=10)
intervals = [-np.inf, 10, 12, 15, 18, 21, 25, 30, np.inf]

# Round each interval up
for i in range(len(intervals) - 1):
    sample[np.logical_and(sample > intervals[i], sample <= intervals[i+1])] = intervals[i+1]

This results in:

[ 30.  18.  18.  15.  30.  10.  inf  18.  25.  21.]

How can I avoid the for loop? I'm sure there's some way using NumPy's array magic that I don't see right now.

Upvotes: 10

Views: 2088

Answers (5)

pault
pault

Reputation: 43524

Another option is:

np.array(intervals)[(sample[:,None] > intervals).sum(axis=1)]
#array([30., 18., 18., 15., 30., 10., inf, 18., 25., 21.])

Essentially we build a mask that checks to see if the sample is greater than the interval (assumes it's already sorted as in your example). Then we sum along the first axis, which will add up a 1 for every interval that the value is greater than.

The resultant sums are the indices in the intervals array.

A non-NumPy solution using a list comprehension (obviously includes the for loop, but should be relatively efficient with the generator):

new_sample = [next(i for i in intervals if i>s) for s in sample]
#[30, 18, 18, 15, 30, 10, inf, 18, 25, 21]

Upvotes: 1

akuiper
akuiper

Reputation: 215047

If intervals is sorted, you can use np.searchsorted:

np.array(intervals)[np.searchsorted(intervals, sample)]
# array([ 30.,  18.,  18.,  15.,  30.,  10.,  inf,  18.,  25.,  21.])

searchsorted returns the index of the interval where the element belongs to:

np.searchsorted(intervals, sample)
# array([7, 4, 4, 3, 7, 1, 8, 4, 6, 5])

The default side='left' returns the smallest index of such interval and the result falls into the left open, right close scenario.

Upvotes: 9

blue note
blue note

Reputation: 29099

If values is a 1D arrays with your values, you could do something like

diff = values < intervals[:, None]
t = np.argmax(diff, axis=0)
new_values = intervals[t]

Upvotes: 0

andrew_reece
andrew_reece

Reputation: 21274

You can use Pandas cut():

import pandas as pd

pd.cut(sample, intervals, labels=intervals[1:]).tolist()

Upvotes: 4

Learning is a mess
Learning is a mess

Reputation: 8277

Did not run a check but:

 from bisect import bisect

 for index, value in enumerate(sample):
     sample[index] = intervals[ bisect( intervals, value)]

Upvotes: 0

Related Questions