Reputation: 238
In C, I can use this simple expression to generate a random number in a range:
rand() % (max_number + 1 - minimum_number) + minimum_number
Is there a similar, non loop based expression that can generate a random number outside a range? For example if my range is from 3 to 5, I would want a random number that would either be in the range 0 to 2, or 6 to RAND_MAX.
Upvotes: 3
Views: 711
Reputation: 31306
This would work:
int r = rand() % (maxval+1 - rangesize);
if(r>=rangemin)
r+=rangesize;
From your example, rangesize
would be 3 since there are three numbers in the range you want to avoid, and rangemin
would be 3 because it is the smallest number in that range.
This would produce random numbers in the range [0, maxval]
except numbers in the range [rangemin, rangemin+rangesize-1]
However, do note that using rand() % x
often gives a bad distribution, so if that matters you need to think about that. Thanks to rici for pointing this out. See his answer for more information about this.
But given that you have a function r(lo, hi)
that generates uniformly distributed numbers from lo
to hi
, the transformation if(r>=rangemin) r+=rangesize
will work just fine and not mess up the distribution.
Relevant links:
Generating a uniform distribution of INTEGERS in C
http://eternallyconfuzzled.com/arts/jsw_art_rand.aspx
Upvotes: 4
Reputation: 241701
The premise is incorrect, assuming you want an unbiased random number. And because the premise is incorrect, there is no analogous solution. Or, better said, because the provided code produces a biased sample, any analogous code for sampling outside of the range will also produce a biased sample.
In C, I can use this simple expression to generate a random number in a range:
rand() % (max_number + 1 - minimum_number) + minimum_number
That will work, more or less, if the size of the range is small relative to the size of the range of possible values returned by rand()
. It will only produce a truly unbiased valuebif rand()
itself is unbiased and the size of the desired range is a factor of RAND_MAX + 1
. Since RAND_MAX + 1
is commonly a power of 2, the only range sizes which can produce an unbiased selection are also powers of two.
It's easy to see that with the pigeon-hole principle. Imagine that there are s
pigeon-holes, each corresponding to a value in the desired range. (s
must be max - min + 1
, of course.) Now each of the RAND_MAX + 1
values which rand()
could produce must be put into one of the pigeon-holes. Since each of those values has equal probability, and the probability of a pigeon-hole being selected is the sum of the probabilities of its contents, it follows that an unbiased outcome requires that all the pigeon-holes have the same number of values, which is only possible if s
is a factor of the number of possible values.
In the not uncommon case that RAND_MAX + 1
is 32768 (Windows, for example), if s
were 6 (a roll of a die), then there would be 5461 values in each of four pigeon-holes, and 5462 values in the other two. That bias is not enormous but it wouldn't pass an inspection by the Gaming Commissioner.
The situation is more dramatic when the desired range is close to RAND_MAX + 1
, which is the situation you will produce if you exclude a small range. In that case, most of the pigeon-holes will have just one value, and a handful of lucky pigeon-holes will each have two values, and therefore be twice as likely to be picked.
The simplest fix is "rejection sampling" which involves a loop. We reject s % (RAND_MAX + 1)
possible returns of rand()
by calling rand()
again if one of these values comes up. (It doesn't matter which values are rejected but it is simple to reject the ones less than s % (RAND_MAX + 1)
.) In the worst case, just under half of the possible return values will be rejected and the loop will run once on average. In more common cases, it will hardly ever run and branch prediction will reduce its cost to very little.
Upvotes: 2
Reputation: 1003
A simple solution would be:
int b = rand() % 2, nr;
if (b)
nr = rand() % min; // [0, min - 1]
else
nr = rand() % (RAND_MAX - max) + max + 1; // [max + 1, RAND_MAX]
To avoid biasing for certain intervals (meaning different probabilities for different numbers), you'll still have to use loops. You may want to check the answer provided here for that.
Edit: As klutt pointed out, this solution adds some bias of its own since the obtained number has a 50% chance to be lower than your min and a 50% chance to be higher than your max, regardless of size difference between the two intervals. Therefore, unless you specifically want this behaviour, the other solution is better for minimal bias.
Upvotes: 0