How to efficiently find indexes of intervals in Python

Question

I have a grid, and a list of values lying in the grid. How can I efficiently compute an index list for the values corresponding to the grid interval containing them. Here is a sample code:

xgrid = [304.0, 317.3, 330.7, 344.1, 357.4, 370.8]
xlist = [310, 320, 360]

output = []
for x in xlist:
    for xi in xgrid:
        if (xi < x):
            xindex = xi
    output.append(xindex)

print(output)

The expected output for this example is [304.0, 317.3, 357.4].

xgrid will be of size 50 or so, but xlist may be bigger and contain between 100-200 values.

VirtualScooter · Accepted Answer

The Python standard library offers bisect, which can be used to do the searching for you:

import bisect

xgrid = [304.0, 317.3, 330.7, 344.1, 357.4, 370.8]
xlist = [310, 320, 335]

def find_lt(a, x):
    'Find rightmost value less than x'
    i = bisect.bisect_left(a, x)
    if i:
        return a[i-1]
    raise ValueError

print([find_lt(xgrid, x) for x in xlist])
# Output: [304.0, 317.3, 330.7]

With respect to speed, I tried the following (append to code above):

import timeit

s = '''\
output = []
for x in xlist:
    for xi in xgrid:
        if (xi < x):
            xindex = xi
    output.append(xindex)
'''
s2 = '''\
output = [find_lt(xgrid, x) for x in xlist]
'''
print(timeit.timeit(s, number=100_000, globals=globals()))
print(timeit.timeit(s2, number=100_000, globals=globals()))

xgrid = [203.1, 207.2, 304.0, 317.3, 330.7,
         344.1, 357.4, 370.8, 400.1, 401.0]
xlist = [310, 320, 335, 399, 402]
print(timeit.timeit(s, number=100_000, globals=globals()))
print(timeit.timeit(s2, number=100_000, globals=globals()))

Output

0.11740579998877365
0.1047545000037644
0.28514970000833273
0.18074260000139475

This would indicate that the bisection algorithm is slightly faster for small lists, and likely becomes progressively better with larger lists.

How to efficiently find indexes of intervals in Python

Answers (2)

Related Questions