user1179317
user1179317

Reputation: 2903

Fastest way to check if the set contains numbers in a given range in Python

What is the fastest way to check if a set contains at least one number within a given range?

For example setA = set(1,4,7,9,10), lowerRange=6, upperRange=8, will return True because of 7.

Currently I am using:

filtered = filter(lambda x: lowerRange<=x<=upperRange,setA)

Then if filtered is not empty, returns a True.

Assuming that setA can be a very large set, is this the optimal solution? Or is this iterating through the entire setA?

Upvotes: 3

Views: 3713

Answers (3)

Raymond Hettinger
Raymond Hettinger

Reputation: 226231

The fastest way is to work with a sorted list or tuple instead of a set. That way you can do the range searches using the bisect module.

Upvotes: 1

Kasravnd
Kasravnd

Reputation: 107287

Since the membership chick is approximately O(1) for sets you can use a generator expression within any() built-in function:

rng = range(6, 9)
any(i in setA for i in rng)

Note that for a short range you'll give a better performance with set.intersection():

In [2]: a = {1,4,7,9,10}

In [3]: rng = range(6, 9)

In [8]: %timeit bool(a.intersection(rng))
1000000 loops, best of 3: 344 ns per loop

In [9]: %timeit any(i in a for i in rng)
1000000 loops, best of 3: 620 ns per loop

But in this case for longer ranges you'd definitely go with any():

In [10]: rng = range(6, 9000)

In [11]: %timeit any(i in a for i in rng)
1000000 loops, best of 3: 620 ns per loop

In [12]: %timeit bool(a.intersection(rng))
1000 loops, best of 3: 233 µs per loop

And note that the reason that any() performs better is because it returns True right after it encounter an item that exist in your set (based on our membership condition) an since the number 8 is at the beginning of the range it makes the any() per forms so fast. Also as mentioned in comment as a more pythonic way for checking the validity of the intersection of an iterable within a set you can use isdisjoint() method. Here is a benchmark with this method for small rage:

In [26]: %timeit not a.isdisjoint(rng)
1000000 loops, best of 3: 153 ns per loop

In [27]: %timeit any(i in a for i in rng)
1000000 loops, best of 3: 609 ns per loop

And here is a benchmark that makes the any() checks the membership for all the numbers which shows that isdisjoint() performs so better:

In [29]: rng = range(8, 1000)

In [30]: %timeit any(i in a for i in rng)
1000000 loops, best of 3: 595 ns per loop

In [31]: %timeit not a.isdisjoint(rng)
10000000 loops, best of 3: 142 ns per loop

Upvotes: 5

SarcasticSully
SarcasticSully

Reputation: 462

Unless you plan to use those values, using the filter function is unnecessary, because it stores data that you won't end up using. It also keeps going even after it finds one that fits the criteria, slowing you down quite a bit.

My solution would have been to write and use the following function.

def check(list, lower, upper):
    for i in list:
        if i >= lower and i <= upper:
            return True
    return False

Like with @Kasramvd's answer, and your idea, this is the brute-force search (O(n) solution). That's impossible to beat unless there are some constraints on the data beforehand, like that it has to be sorted.

Upvotes: 0

Related Questions