user975176
user975176

Reputation: 448

Which is the fastest way to search a value within a set of many "range" objects in Python

I have a list of many Python objects like this:

class RangeClass(object):

    def __init__(self,address,size):
        self.address=address
        self.size=size
        #other attributes and methods...

Then, I have a list (rangelist) of RangeClass objects.

I need to find within which range a given value is.

I can use some code like this:

for r in ragelist:
    if(value>=r.address and value<(r.address+r.size)):
        return r
return None

But I think there is a faster way. Ranges have arbitrary size, but we can assume that they don't overlap.

Thank you.

Upvotes: 1

Views: 173

Answers (4)

user975176
user975176

Reputation: 448

Thank you to everyone,

I am actually using the method proposed by unutbu.

Moreover, I add another optimization:

if(value <= first_range_value or value >= last_range_value):
    return None

Where first_range_value and last_range_value have been computed before, and they are the smallest r.address value and the largest r.address+r.size value .

This is worthing in my application, but it really depends on the distribution of ranges and values.

Upvotes: 1

unutbu
unutbu

Reputation: 879401

If you have many values to test, then you could use the bisect module to find which range the values are in more quickly.

If

  • m = the number of values to test, and
  • n = len(rangelist)

then looping through the values and rangelist as you suggest would take O(m*n) time.

If you use bisection, then you must first sort the starting addresses O(nlogn) and find each value's place in rangelist O(m*logn). So if

O(nlogn + m*logn) < O(m*n)

then bisection wins. For large n, O(m*logn) is miniscule compared to O(m*n). So the inequality above would be true if

O(nlogn) < O(m*n)

or equivalently, when

C log(n) < m

for some constant C.


Thus, when n is large and C log(n) < m, you might try something like

import bisect

class RangeClass(object):

    def __init__(self,address,size):
        self.address=address
        self.size=size
    def __str__(self):
        return '[{0},{1})'.format(self.address,self.address+self.size)
    def __lt__(self,other):
        return self.address<other.address

rangelist=sorted([RangeClass(i,1) for i in (1,3,4,5,7.5)])
starts=[r.address for r in rangelist]

def find_range(value):
    start_idx=bisect.bisect(starts,value)-1
    try:
        r=rangelist[start_idx]
    except IndexError:
        return None
    start=r.address
    end=r.address+r.size
    if start<=value<end:
        return rangelist[start_idx]
    return None    

print(','.join(str(r) for r in rangelist))

for value in (0,1,1.5,2,3,4,5,6,7,8,9,10):
    r=find_range(value)
    if r:
        print('{v} in {r}'.format(v=value,r=str(r)))
    else:
        print('{v} not in any range'.format(v=value))

Upvotes: 3

cJ Zougloub
cJ Zougloub

Reputation: 1494

Implement the comparison operator in Range, have the range list sorted, and use bisect to search for the range a value belongs in:

import bisect
def find_range(value):
    index = bisect.bisect(rangelist, value)
    if index not in (0, len(rangelist)):
        index -= 1
    return rangelist[index]

Upvotes: 1

Ignacio Vazquez-Abrams
Ignacio Vazquez-Abrams

Reputation: 798606

Not really. All you can do is take advantage of Python's relational operator chaining.

if r.address <= value < (r.address + r.size):

You could also define __contains__ on RangeClass to allow you to use in to find it instead.

class RangeClass(object):
   ...
  def __contains__(self, val):
    return self.address <= val < (self.address + self.size)

 ...
  if value in r:

Upvotes: 2

Related Questions