Reputation: 49

Generating M distinct random numbers (one at a time) from a given range 0..N-1 in less than O(M) memory

Is there any method to do this?

I mean, we even cannot work with "in" array of {0,1,..,N-1} (because it's at least O(N) memory).

M can be = N. N can be > 2^64. Result should be uniformly random and would better be every possible sequence (but may not).

Also full-range PRNGs (and friends) aren't suitable, because it will give same sequence each time.

Time complexity doesn't matter.

Upvotes: 4

Answers (3)

barmatat

Reputation: 291

Brute force: if time complexity doesn't matter it would be a solution for 0 < M <= N invariant. nextRandom(N) is a function which returns random integer in [0..N):

init() {
        for (int idx = 0; idx < N; idx++) {
            a[idx] = -1;
        }
        for (int idx = 0; idx < M; idx++) {
            getNext();
        }
    }

    int getNext() {
        for (int idx = 1; idx < M; idx++) {
            a[idx -1] = a[idx];
        }
        while (true) {
            r = nextRandom(N);
            idx = 0;
            while (idx < M && a[idx] != r) idx++;
            if (idx == M) {
                a[idx - 1] = r;
                return r;
            }
        }
    }

O(M) solution: It is recursive solution for simplicity. It supposes to run nextRandom() which returns a random number in [0..1):

rnd(0, 0, N, M); // to get next M distinct random numbers

int rnd(int idx, int n1, int n2, int m) {
    if (n1 >= n2 || m <= 0) return idx;
    int r = nextRandom(n2 - n1) + n1;
    int m1 = (int) ((m-1.0)*(r-n1)/(n2-n1) + nextRandom()); // gives [0..m-1]
    int m2 = m - m1 - 1;
    idx = rnd(idx, n1, r-1, m1);
    print r;
    return rnd(idx+1, r+1, n2, m2);
}

the idea is to select a random r in between [0..N) on first step which splits the range on two sub-ranges by N1 and N2 elements in each (N1+N2==N-1). We need to repeat the same step for [0..r) which has N1 elements and [r+1..N) (N2 elements) choosing M1 and M2 (M1+M2==M-1) so as M1/M2 == N1/N2. M1 and M2 must be integers, but the proportion can give real results, we need to round values with probabilities (1.2 will give 1 with p=0.8 and 2 with p=0.2 etc.).

Upvotes: 0

rici

Reputation: 241721

If you don't care what order the random selection comes out in, then it can be done in constant memory. The selection comes out in order.

The answer hinges on estimating the probability that the smallest value in a random selection of M distinct values of the set {0, ..., N-1} is i, for each possible i. Call this value p(i, M, N). With more mathematics than I have the patience to type into an interface which doesn't support Latex, you can derive some pretty good estimates for the p function; here, I'll just show the simple, non-time-efficient approach.

Let's just focus on p(0, M, N), which is the probability that a random selection of M out of N objects will include the first object. Then we can iterate through the objects (that is, the numbers 0...N-1) one at a time; deciding for each one whether it is included or not by flipping a weighted coin. We just need to compute the coin's weights for each flip.

By definition, there are _MC_N possible M-selections of a set of N objects. Of these _MC_N-1 do not include the first element. (That's the count of M-selections of N-1 objects, which is all the M-selections of the set missing one element). Similarly, _M-1C_N-1 selections do include the first element (that is, all the M-1-selections of the N-1-set, with the first element added to each selection).

These two values add up to _MC_N; the well-known recursive algorithm for computing C.

So p(0, M, N) is just _M-1C_N-1/_MC_N. Since _MC_N = N!/(M!*(N-M)!), we can simplify that fraction to M/N. As expected, if M == N, that works out to 1 (M of N objects must include every object).

So now we know what the probability that the first object will be in the selection. We can then reduce the size of the set, and either reduce the remaining selection size or not, depending on whether the coin flip determined that we did or did not include the first object. So here's the final algorithm, in pseudo-code, based on the existence of the weighted random boolean function:

w(x, y) => true with probability X / Y; otherwise false.

I'll leave the implementation of w for the reader, since it's trivial.

So:

Generate a random M-selection from the set 0...N-1
Parameters: M, N

Set i = 0
while M > 0:
  if w(M, N):
     output i
     M = M - 1
  N = N - 1
  i = i + 1

It might not be immediately obvious that that works, but note that:

the output i statement must be executed exactly M times, since it is coupled with a decrement of M, and the while loop executes until M is 0
The closer M gets to N, the higher the probability that M will be decremented. If we ever get to the point where M == N, then both will be decremented in lockstep until they both reach 0.
i is incremented exactly when N is decremented, so it must always be in the range 0...N-1. In fact, it's redundant; we could output N-1 instead of outputting i, which would change the algorithm to produce sets in decreasing order instead of increasing order. I didn't do that because I think the above is easier to understand.

The time complexity of that algorithm is O(N+M) which must be O(N). If N is large, that's not great, but the problem statement said that time complexity doesn't matter, so I'll leave it there.

Upvotes: 3

pjs

Reputation: 19855

PRNGs that don't map their state space to a lower number of bits for output should work fine. Examples include Linear Congruential Generators and Tausworthe generators. They will give the same sequence if you use the same seed to start them, but that's easy to change.

Upvotes: 1

Generating M distinct random numbers (one at a time) from a given range 0..N-1 in less than O(M) memory

Answers (3)

Related Questions