A.M.
A.M.

Reputation: 1797

generate non-consecutive samples

How can we efficiently generate k random and non-consecutive samples out of [1,...,N]?

Non-desired Example with (N=10, k=4): 2,3,8,10

This is not a desired example since 2 and 3 are consecutive.

Desired Example with (N=10, k=4): 2,6,8,10

This is a good example since the difference between every pair of samples is greater than 1

Upvotes: 13

Views: 1388

Answers (9)

Daniel
Daniel

Reputation: 36720

sort(randperm(N-(k-1),k))+[0:(k-1)]

There is a simple obersavation behind this solution, if you take any sorted solution to your problem and substract [0:(k-1)], you end up with a random choice of k numbers out of N-(k-1)

Upvotes: 15

rayryeng
rayryeng

Reputation: 104545

A solution in MATLAB (perhaps inelegant) could be something like this:

N = 10;
k = 4;
out = zeros(1,k);

vec = 1 : N;

for idx = 1 : k
    ind = randi(numel(vec), 1);
    left = max(ind-1, 1); right = min(numel(vec), ind+1);
    out(idx) = vec(ind);
    to_remove = ind;
    if vec(left) == vec(ind)-1 
        to_remove = [to_remove left];
    end
    if vec(right) == vec(ind)+1
        to_remove = [to_remove right];
    end
    vec(to_remove) = [];
end

We first declare N and k, then declare an output array of zeroes that is k long. We then generate a sampling vector vec that goes from 1 up to as N initially. Next, for each value we want to place into the output, we generate a random position to sample from the vector, then take a look at the position from the left and from the right... ensuring that we are within the boundaries of the array. Also, we only remove to the left or right if the value at the left of the index to remove and also the right are equal to each other (thanks beaker!)

We use this location and sample from this vector, place the value at this location to the output, then remove the indices in this vector that are to the left, to the right, and the actual index itself from this vector. This removes the possibility of sampling from those values again. We repeat this until we run out of values to place in the output.

Here are a few of trial runs:

>> out

out =

     9     7     1     5

>> out

out =

     7     1     4    10

>> out

out =

    10     8     1     6

>> out

out =

    10     4     8     1

Upvotes: 2

Luis Mendo
Luis Mendo

Reputation: 112749

Let S denote the set of all k-element vectors with values taken from [1,...,N] that don't have any consecutive values. To randomly sample with a uniform distribution over S, you can use a rejection method:

  1. Sample uniformly on a larger sample space, T.
  2. If the sample belongs to the target region S, accept the sample. Else return to step 1 (the sample is rejected).

In Matlab, it's easy to generate uniformly distributed k-element vectors with values taken from [1,...,N] without replacement (function randsample). So this is used as the sample space T:

k = 4;
N = 10;
result = [1 1];                         % // just to get while loop started
while any(diff(result)<=1)              % // does not meet condition: try again
    result = sort(randsample(N, k).');  %'// random sample without replacement
end

Upvotes: 4

fferri
fferri

Reputation: 18950

My implementation:

def ncsample(population, k):
    import random
    if k > 0:
        i = random.randrange(0, len(population) - 2*(k-1))
        return [population[i]] + ncsample(population[i+2:], k-1)
    else:
        return []

Note: it randomly finds the sequence in one shot (no rejection sampling in a while loop).

MATLAB implementation:

function r = ncsample(population, k)
    if k > 0
        i = randi(length(population) - 2*(k-1));
        r = [population(i) ncsample(population((i+2):end), k-1)];
    else
        r = [];
    end
end

Some tests:

>> for i=1:10; fprintf('%s\n',sprintf('%d ', ncsample(1:10, 4))); end
1 5 7 9 
3 5 8 10 
3 5 8 10 
4 6 8 10 
2 6 8 10 
1 4 8 10 
1 4 7 9 
3 6 8 10 
1 6 8 10 
2 4 7 9 

Upvotes: 1

Stijn Nevens
Stijn Nevens

Reputation: 256

This is a recursive elegant version, I just added a check on k and N to avoid infinite recursion, if k>N/2 no solution exists.

The result is guaranteed random.

import random

def myFunc(N,k):
    if k>(N+1)/2:
        return "k to big for N"
    returnValue = sorted(random.sample(range(1,N+1),k))
    toTest = [x - returnValue[i - 1] for i, x in enumerate(returnValue)][1:]
    if 1 in toTest:
        return myFunc(N,k)
    else:
        return returnValue

print myFunc(10,4)

Upvotes: 1

nicolas
nicolas

Reputation: 3290

You can make the increment between samples evenly distributed between 2 and N-1 (to avoid consecutive and repeated numbers):

N=10;
k=4;
increments = floor(rand(1,k)*(N-2))+2  %// increments allowed are from 2 to N-1 inclusive
out = mod(cumsum(increments), N)+1   %// sum increments

Same in python:

from numpy import cumsum, floor, mod, random
N=5
k=100
increments = floor(random.rand(1,k)*(N-2))+2
out = mod(cumsum(increments), N)+1
print(out)

[ 5.  3.  1.  5.  2.  4.  3.  2.  4.  2.  4.  3.  1.  5.  4.  3.  5.  4.
  2.  5.  4.  2.  5.  2.  4.  1.  5.  4.  1.  5.  3.  1.  3.  2.  4.  1.
  5.  4.  1.  3.  5.  4.  3.  5.  2.  1.  3.  2.  4.  3.  1.  4.  2.  1.
  3.  2.  1.  4.  3.  2.  1.  3.  5.  3.  5.  4.  2.  4.  2.  1.  3.  2.
  1.  3.  5.  2.  5.  4.  3.  1.  4.  1.  4.  3.  5.  4.  2.  1.  5.  2.
  1.  5.  4.  2.  4.  3.  5.  2.  4.  1.]

Over 100 iterations, even if I limit the number to 1..5, there is no repeated/consecutive number.

Upvotes: 2

Blair
Blair

Reputation: 6693

A Python class which correctly checks every pair of samples. You're responsible for not passing it a set of numbers that is impossible, though (like N = 10, k = 100).

>>> class NonConsecutiveSampler(object):
        def __init__(self,N):
                import random
                self.num = N
        def get_samples(self,k):
                possibilities = [i for i in range(1,self.num + 1)]
                samples = []
                while len(samples) < k:
                        r = random.sample(possibilities,1)[0]
                        samples.append(r)
                        for i in range(r - 1, r + 2):
                                if i in possibilities:
                                        possibilities.remove(i)
                samples.sort()
                return samples


>>> n = NonConsecutiveSampler(10)
>>> n.get_samples(4)
[2, 5, 8, 10]
>>> n.get_samples(4)
[1, 5, 7, 10]
>>> n.get_samples(4)
[3, 6, 8, 10]
>>> n.get_samples(4)
[1, 3, 5, 8]

EDIT: Made it much more efficient

Upvotes: 3

Carl Witthoft
Carl Witthoft

Reputation: 21532

Sometimes it's faster and easier to generate more samples than you need and then throw away the undesireable values.

One (slow) example.

vec= randi(100,1,1);
for j = 2:50,
   while(abs(vec(j)-vec(j-1)<2) vec(j)= randi(100,1,1);end;
end

Another way. Suppose you want 50 samples

vec = rand(100,100,1);
badindex = find(abs(vec(1:99)-vec(2:100) < 1));
vec(badindex) = vec(badindex-1)+vec(badindex+1);
% if you don't want big values,
vec(vec>100) = vec (vec>100) -100; % to ensure, I hope, that neighbors

% are nonconsecutive (This would be easier in R) .

Upvotes: 2

mdurant
mdurant

Reputation: 28684

A not-particularly-elegant python solution:

def nonseq(n, k):
    out = [random.randint(0, n)]
    while len(out) < k:
        x = random.randint(0, n)
        if abs(x - out[-1]) > 1:
            out.append(x)
    return out

Upvotes: 1

Related Questions