larsl
larsl

Reputation: 358

Get random items from range of list

Let's say I have a unsorted set of items:

input = set([45, 235, 3, 77, 55, 80, 154])

I need to get random values from this input but in a specific range. E.g. when I have

ran = [50, 100]

I want it to return either 77 or 55 or 80. What's the fastest way to get this for large sets in python?

Upvotes: 0

Views: 305

Answers (4)

roganjosh
roganjosh

Reputation: 13185

You didn't clarify whether you could or could not use numpy but also asked for "the fastest" so I'll include the numpy method for completeness. In this case, the "python_method" approach is the answer given by Jean-François Fabre here

import numpy as np
import bisect,random

data = np.random.randint(0, 60, 10000)
high = 25
low = 20

def python_method(data, low, high):
    data = sorted(data)
    start_index = bisect.bisect_left(data,low)
    end_index = bisect.bisect_right(data,high)
    return data[random.randrange(start_index,end_index)]

def numpy_method(data, low, high):
    return np.random.choice(data[(data >=low) & (data <= high)])

Timings:

%timeit python_method(data, low, high)
2.34 ms ± 11.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit numpy_method(data, low, high)
33.2 µs ± 72.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Obviously, though, you'd only sort the list once if you were using that function several times, so that will cut down the Python runtime quite to the same level.

%timeit new_data = sorted(data)
2.33 ms ± 39.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

numpy would pull ahead again in cases where you needed multiple results from within a single range as you could get them in a single call.

EDIT:

In the case that the input array is already sorted, and you're sure you can exploit that (taking sorted() out of timeit), the pure python method wins in the case of picking single values:

%timeit python_method(data, low, high)
5.06 µs ± 16.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

The un-modified numpy method gives:

%timeit numpy_method(data, low, high)
20.5 µs ± 668 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

So, as far as I can tell, in cases where the list is already sorted, and you only want one result, the pure-python method wins. If you wanted multiple results from within that range it might be different but I'm benchmarking against randrange.

Upvotes: 2

Jean-Fran&#231;ois Fabre
Jean-Fran&#231;ois Fabre

Reputation: 140307

Using a set for this isn't the right way because elements aren't sorted. This would lead to a O(N) solution to test each element against the boundaries.

I'd suggest to turn the data into a sorted list, then you can use bisect to find start & end indexes for your boundary values, then apply random.choice on the sliced list:

import bisect,random

data = sorted([45, 235, 3, 77, 55, 80, 154])

def rand(start,stop):
    start_index = bisect.bisect_left(data,start)
    end_index = bisect.bisect_right(data,stop)
    return data[random.randrange(start_index,end_index)]

print(rand(30,100))

bisect has O(log(N)) complexity on sorted lists. Then pick an index with random.randrange.

bisect uses compiled code on mainstream platforms, so it's very efficient besides its low complexity.

Boundaries are validated by performing a limit test:

print(rand(235,235))

which prints 235 as expected (always difficult to make sure that the arrays aren't out of bounds when using random)

(if you want to update your data while running, you can also use bisect to insert elements, it's slower than with set because of the O(log N) complexity + insertion in list, of course but you cannot have everything)

Upvotes: 6

Laurent H.
Laurent H.

Reputation: 6536

Here is my suggestion that I find readable, easy to understand and quite short:

import random

inputSet = set([45, 235, 3, 77, 55, 80, 154])
ran = [50,100]

# Get list of elements inside the range
a = [x for x in inputSet if x in range(ran[0],ran[1])]

# Print a random element
print(random.choice(a))  # randomly 55, 77 or 80

Note that I have not used the name input for the defined set because it is a reserved built-in symbol.

Upvotes: 0

Harald Nordgren
Harald Nordgren

Reputation: 12439

from random import randint

input = set([45, 235, 3, 77, 55, 80, 154])
ran = [50, 100]

valid_values = []
for i in input:
    if ran[0] <= i <= ran[1]:
        valid_values.append(i)

random_index = randint(0, len(valid_values)-1)
print(valid_values[random_index])

Upvotes: 1

Related Questions