ShanZhengYang
ShanZhengYang

Reputation: 17631

Drawing random numbers with draws in some pre-defined interval, `numpy.random.choice()`

I would like to use numpy.random.choice() but make sure that the draws are spaced by at least a certain "interval":

As a concrete example,

import numpy as np
np.random.seed(123)
interval = 5
foo = np.random.choice(np.arange(1,50), 5)  ## 5 random draws between array([ 1,  2, ..., 50])
print(foo)
## array([46,  3, 29, 35, 39])

I would prefer these be spaced by at least the interval+1, i.e. 5+1=6. In the above example, this condition isn't met: there should be another random draw, as 35 and 39 are separated by 4, which is less than 6.

The array array([46, 3, 29, 15, 39]) would be ok, as all draws are spaced by at least 6.

numpy.random.choice(array, size) draws size number of draws in array. Is there another function used to check the "spacing" between elements in a numpy array? I could write the above with an if/while statement, but I'm not sure how to most efficiently check the spacing of elements in a numpy array.

Upvotes: 5

Views: 554

Answers (2)

user8936101
user8936101

Reputation:

You can sort the array first to have all points in an ascending order, then use np.diff to find the difference between consecutive values. If any difference is smaller than the interval, then the condition has not been met. i.e.

import numpy as np

interval = 5
foo = np.random.choice(np.arange(1,50),5)
while np.any(np.diff(np.sort(foo)) <= interval):
     foo = np.random.choice(np.arange(1,50),5)
print(foo)

Which would loop until you get a numpy array where all values differ by atleast the interval.

Upvotes: 2

Paul Panzer
Paul Panzer

Reputation: 53029

Here is a solution that inserts the spaces after drawing:

def spaced_choice(low, high, delta, n_samples):
    draw = np.random.choice(high-low-(n_samples-1)*delta, n_samples, replace=False)
    idx = np.argsort(draw)
    draw[idx] += np.arange(low, low + delta*n_samples, delta)
    return draw

Sample run:

spaced_choice(4, 20, 3, 4)
# array([ 5,  9, 19, 13])
spaced_choice(1, 50, 5, 5)
# array([30,  8,  1, 15, 43])

Please note that a draw and then accept-or-reject-and-redraw strategy can be very expensive. In the worst-case example below redrawing takes almost half a minute for just 10 samples because the accpet/reject ratio is very poor. The insert-the-spaces-afterwards method has no problems of this kind.

Time required by different methods for two examples:

low, high, delta, size = 1, 100, 5, 5
add_spaces            0.04245870 ms
redraw                0.11335560 ms
low, high, delta, size = 1, 20, 1, 10
add_spaces            0.03201030 ms
redraw            27881.01527220 ms

Code:

import numpy as np

import types
from timeit import timeit

def f_add_spaces(low, high, delta, n_samples):
    draw = np.random.choice(high-low-(n_samples-1)*delta, n_samples, replace=False)
    idx = np.argsort(draw)
    draw[idx] += np.arange(low, low + delta*n_samples, delta)
    return draw

def f_redraw(low, high, delta, n_samples):
    foo = np.random.choice(np.arange(low, high), n_samples)
    while any(x <= delta for x in np.diff(np.sort(foo))):
        foo = np.random.choice(np.arange(low, high), n_samples)
    return foo

for l, h, k, n in [(1, 100, 5, 5), (1, 20, 1, 10)]:
    print(f'low, high, delta, size = {l}, {h}, {k}, {n}')
    for name, func in list(globals().items()):
        if not name.startswith('f_') or not isinstance(func, types.FunctionType):
            continue
        print("{:16s}{:16.8f} ms".format(name[2:], timeit(
                'f(*args)', globals={'f':func, 'args':(l,h,k,n)}, number=10)*100))

Upvotes: 3

Related Questions