Reputation: 358
Let's say I have a unsorted set of items:
input = set([45, 235, 3, 77, 55, 80, 154])
I need to get random values from this input but in a specific range. E.g. when I have
ran = [50, 100]
I want it to return either 77 or 55 or 80. What's the fastest way to get this for large sets in python?
Upvotes: 0
Views: 305
Reputation: 13185
You didn't clarify whether you could or could not use numpy
but also asked for "the fastest" so I'll include the numpy
method for completeness. In this case, the "python_method
" approach is the answer given by Jean-François Fabre here
import numpy as np
import bisect,random
data = np.random.randint(0, 60, 10000)
high = 25
low = 20
def python_method(data, low, high):
data = sorted(data)
start_index = bisect.bisect_left(data,low)
end_index = bisect.bisect_right(data,high)
return data[random.randrange(start_index,end_index)]
def numpy_method(data, low, high):
return np.random.choice(data[(data >=low) & (data <= high)])
Timings:
%timeit python_method(data, low, high)
2.34 ms ± 11.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit numpy_method(data, low, high)
33.2 µs ± 72.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Obviously, though, you'd only sort
the list once if you were using that function several times, so that will cut down the Python runtime quite to the same level.
%timeit new_data = sorted(data)
2.33 ms ± 39.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
numpy
would pull ahead again in cases where you needed multiple results from within a single range as you could get them in a single call.
EDIT:
In the case that the input array is already sorted, and you're sure you can exploit that (taking sorted()
out of timeit
), the pure python method wins in the case of picking single values:
%timeit python_method(data, low, high)
5.06 µs ± 16.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
The un-modified numpy
method gives:
%timeit numpy_method(data, low, high)
20.5 µs ± 668 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
So, as far as I can tell, in cases where the list is already sorted, and you only want one result, the pure-python method wins. If you wanted multiple results from within that range it might be different but I'm benchmarking against randrange
.
Upvotes: 2
Reputation: 140307
Using a set
for this isn't the right way because elements aren't sorted. This would lead to a O(N)
solution to test each element against the boundaries.
I'd suggest to turn the data into a sorted list, then you can use bisect
to find start & end indexes for your boundary values, then apply random.choice
on the sliced list:
import bisect,random
data = sorted([45, 235, 3, 77, 55, 80, 154])
def rand(start,stop):
start_index = bisect.bisect_left(data,start)
end_index = bisect.bisect_right(data,stop)
return data[random.randrange(start_index,end_index)]
print(rand(30,100))
bisect
has O(log(N))
complexity on sorted lists. Then pick an index with random.randrange
.
bisect
uses compiled code on mainstream platforms, so it's very efficient besides its low complexity.
Boundaries are validated by performing a limit test:
print(rand(235,235))
which prints 235
as expected (always difficult to make sure that the arrays aren't out of bounds when using random)
(if you want to update your data while running, you can also use bisect
to insert elements, it's slower than with set
because of the O(log N)
complexity + insertion in list
, of course but you cannot have everything)
Upvotes: 6
Reputation: 6536
Here is my suggestion that I find readable, easy to understand and quite short:
import random
inputSet = set([45, 235, 3, 77, 55, 80, 154])
ran = [50,100]
# Get list of elements inside the range
a = [x for x in inputSet if x in range(ran[0],ran[1])]
# Print a random element
print(random.choice(a)) # randomly 55, 77 or 80
Note that I have not used the name input
for the defined set because it is a reserved built-in symbol.
Upvotes: 0
Reputation: 12439
from random import randint
input = set([45, 235, 3, 77, 55, 80, 154])
ran = [50, 100]
valid_values = []
for i in input:
if ran[0] <= i <= ran[1]:
valid_values.append(i)
random_index = randint(0, len(valid_values)-1)
print(valid_values[random_index])
Upvotes: 1