JASON
JASON

Reputation: 7491

uniformly distributed random number generation

Why does this code generates uniformly distributed numbers? I have some difficulties in understanding it. Could someone explain? Thanks.

int RandomUniform(int n) {  
  int top = ((((RAND_MAX - n) + 1) / n) * n - 1) + n;  
  int r;  
  do {  
    r = rand();  
  } while (r > top);  
  return (r % n);  
}

update: I do understand why rand()%n doesn't give you a uniformly distributed sequence. My question is why the

top = ((((RAND_MAX - n) + 1) / n) * n - 1) + n;

What's the concern here? I think a simple top = RAND_MAX / n * n would do.

Upvotes: 6

Views: 4675

Answers (3)

Mike Seymour
Mike Seymour

Reputation: 254431

The function assumes that rand() is uniformly distributed; whether or not that is a valid assumption depends on the implementation of rand().

Given a uniform rand(), we can get a random number in the range [0,n) by calculating rand()%n. However, in general, this won't be quite uniform. For example, suppose n is 3 and RAND_MAX is 7:

rand()      0 1 2 3 4 5 6 7
rand() % n  0 1 2 0 1 2 0 1

We can see that 0 and 1 come up with a probability of 3/8, while 2 only comes up with a probability of 2/8: the distribution is not uniform.

Your code discards any value of rand() greater or equal to the largest multiple of n that it can generate. Now each value has an equal probability:

rand()      0 1 2 3 4 5 6 7
rand() % n  0 1 2 0 1 2 X X

So 0,1 and 2 all come up with a probability of 1/3, as long as we are not so unlucky that the loop never terminates.

Regarding your update:

I think a simple top = RAND_MAX / n * n would do.

If RAND_MAX were an exclusive bound (one more than the actual maximum), then that would be correct. Since it's an inclusive bound, we need to add one to get the exclusive bound; and since the following logic compares with > against an inclusive bound, then subtract one again after the calculation:

int top = ((RAND_MAX + 1) / n) * n - 1;

However, if RAND_MAX were equal to INT_MAX, then the calculation would overflow; to avoid that, subtract n at the beginning of the calculation, and add it again at the end:

int top = (((RAND_MAX - n) + 1) / n) * n - 1 + n;

Upvotes: 10

Pete Becker
Pete Becker

Reputation: 76245

I didn't trace through the code that computes top, but RAND_MAX is the largest value that rand() can return; (RAND_MAX + 1) / n * n would be a better ceiling, but if RAND_MAX is, say, INT_MAX, the result would be unpredictable. So maybe all that code is trying to avoid overflow.

Upvotes: 2

Pete Becker
Pete Becker

Reputation: 76245

The underlying problem is this: suppose you have a random number generator my_rand() that produces value from 0 to 6, inclusive, and you want to generate values from 0 to 5, inclusive; if you run your generator and return my_rand() % 6, you won't get a uniform distribution. When my_rand() returns 0, you get 0; when it returns 1, you get 1, etc. until my_rand() returns 6; in that case my_rand() % 6 is 0. So overall, my_rand() % 6 will return 0 twice as often as any other value. The way to fix this is to not use values greater than 5, that is, instead of my_rand() % 5 you write a loop and discard values from my_rand() that are too large. That's essentially what the code in the question is doing. I haven't traced it through, but the usual implementation is to compute the largest multiple of n that is less than or equal to RAND_MAX, and whenever rand() returns a value that's greater than that multiple, go back and get a new value.

Upvotes: 7

Related Questions