Reputation: 2903
What is the fastest way to check if a set contains at least one number within a given range?
For example setA = set(1,4,7,9,10)
, lowerRange=6
, upperRange=8
, will return True because of 7.
Currently I am using:
filtered = filter(lambda x: lowerRange<=x<=upperRange,setA)
Then if filtered is not empty, returns a True.
Assuming that setA can be a very large set, is this the optimal solution? Or is this iterating through the entire setA?
Upvotes: 3
Views: 3713
Reputation: 226231
The fastest way is to work with a sorted list or tuple instead of a set. That way you can do the range searches using the bisect module.
Upvotes: 1
Reputation: 107287
Since the membership chick is approximately O(1) for sets you can use a generator expression within any()
built-in function:
rng = range(6, 9)
any(i in setA for i in rng)
Note that for a short range you'll give a better performance with set.intersection()
:
In [2]: a = {1,4,7,9,10}
In [3]: rng = range(6, 9)
In [8]: %timeit bool(a.intersection(rng))
1000000 loops, best of 3: 344 ns per loop
In [9]: %timeit any(i in a for i in rng)
1000000 loops, best of 3: 620 ns per loop
But in this case for longer ranges you'd definitely go with any()
:
In [10]: rng = range(6, 9000)
In [11]: %timeit any(i in a for i in rng)
1000000 loops, best of 3: 620 ns per loop
In [12]: %timeit bool(a.intersection(rng))
1000 loops, best of 3: 233 µs per loop
And note that the reason that any()
performs better is because it returns True right after it encounter an item that exist in your set (based on our membership condition) an since the number 8 is at the beginning of the range it makes the any()
per forms so fast. Also as mentioned in comment as a more pythonic way for checking the validity of the intersection of an iterable within a set you can use isdisjoint()
method. Here is a benchmark with this method for small rage:
In [26]: %timeit not a.isdisjoint(rng)
1000000 loops, best of 3: 153 ns per loop
In [27]: %timeit any(i in a for i in rng)
1000000 loops, best of 3: 609 ns per loop
And here is a benchmark that makes the any()
checks the membership for all the numbers which shows that isdisjoint()
performs so better:
In [29]: rng = range(8, 1000)
In [30]: %timeit any(i in a for i in rng)
1000000 loops, best of 3: 595 ns per loop
In [31]: %timeit not a.isdisjoint(rng)
10000000 loops, best of 3: 142 ns per loop
Upvotes: 5
Reputation: 462
Unless you plan to use those values, using the filter
function is unnecessary, because it stores data that you won't end up using. It also keeps going even after it finds one that fits the criteria, slowing you down quite a bit.
My solution would have been to write and use the following function.
def check(list, lower, upper):
for i in list:
if i >= lower and i <= upper:
return True
return False
Like with @Kasramvd's answer, and your idea, this is the brute-force search (O(n) solution). That's impossible to beat unless there are some constraints on the data beforehand, like that it has to be sorted.
Upvotes: 0