Reputation: 15
So I have a homework assignment, and we need to generate random numbers between 1 and 100 in C. I have a working example with int i = rand()%100.
But according to the homework that is technically incorrect which I don't really get. The Homework explanation is as follows
"1.1 We use a random number generator to simulate bus arrival times. ===> the rand( ) function.The rand( ) function returns a pseudo random number 0 to RAND_MAX (2^31-1 in linux).To generate a random number, rn, between 0.0 and 1.0; rn = rand( ) / RAND_MAX.(by the way, a lot of people do below to create, say, 2 digit random numbers. r_num = rand( ) % 100; since % 100 is 0 to 99. However, this is wrong. The right way of generate 2 digit random number is: divide 0-RAND_MAX in 10 intervals and see where the random number falls. The interval time is, it = RAND_MAX / 100. Then, map it to one of 0 - 99 by the following: 0 1 2 3 ......... 99 0 it 2it 3it 99it to RAND_MAX If the rand( ) returns a number is between (12it) and (13*it), the 2 digit random number is 12.)"
I was hoping someone could take a stab at explaining what it is saying, I'm not really looking for code examples just an understanding of the problem.
Upvotes: 1
Views: 1498
Reputation: 754190
You can find relevant code on SO. For example, the rand_int()
code below is based on code for integers in an answer to
Is this C implementation of the Fisher-Yates shuffle correct? (and specifically the answer by Roland Illig):
static size_t rand_int(size_t n)
{
size_t limit = RAND_MAX - RAND_MAX % n;
size_t rnd;
while ((rnd = rand()) >= limit)
;
return rnd % n;
}
The idea is that you calculate and ignore the large values returned by rand()
which would lead to biassed results. When one of the large values is returned, you ignore it and try the next value. This will seldom need more than two calls to rand()
.
You might find some of the external references in Shuffle array in C useful too.
Upvotes: 1
Reputation: 141235
RAND_MAX
is usually 2^31 - 1
so it is equal 2147483647
.
But let's assume for simplicity that we have a very strange system, with RAND_MAX
= 100 (so rand()
can return 0
to 100
, that's 101 numbers). And let's assume the rand()
function has ideal uniform distribution.
Now, what is the probability of rand() % 100
? The numbers 1
to 99
have the same probability, that is 1/101
. But 0
has the probability 2/101
because when rand()
return 0
and when rand()
return 100
, the expression rand() % 100
will be equal to 0
. So 0
can come more often then any other numbers, actually two times more often. So our distribution of 2-digit numbers with rand() % 100
is not uniform.
Now, the text proposes a solution to the problem. The proposed solution is to split 0
to RAND_MAX
region into 100 even parts, so that numbers within each part have the same probability. Then roll rand()
and see in which region the number ended. If RAND_MAX
is 2147483647
and we for example get a number 279172968
we can see it ends in the 13th region - between RAND_MAX / 100 * 13 = 279172868
and RAND_MAX / 100 * 14 = 300647704
.
The solution is also flawed, as we can see, that it is impossible to divide 0
to RAND_MAX
into 100 even parts when RAND_MAX % 100
is not equal to 0
.
I feel the only viable solution is to discard all numbers greater then RAND_MAX / 100 * 100
(using C integer arithmetic). The rest of the numbers will have uniform distribution and the maximum will be divisible by 100, so with the rest we can just rand() % 100
. So something like this:
int get_2_digit_number() {
int r = 0;
while (1) {
r = rand();
if (r > (RAND_MAX / 100 * 100)) {
continue;
}
break;
}
return r % 100;
}
Upvotes: 2
Reputation: 1993
There are a couple of problems there, both having to do with how the modulo operator works. a % b
effectively gives you the remainder when you divide a by b. So let's suppose that we're computing numbers modulo 4. Let's also assume that RAND_MAX = 6, because I really don't want to have 32768+ rows in my table.
a | a % 4
------------
0 | 0
1 | 1
2 | 2
3 | 3
4 | 0
5 | 1
6 | 2
So if you're using your approach to generate random numbers between 1 and 4, you have two problems. First, the simple one: you're generating numbers between 0 and 3, not 1 and 4. The result of the modulo operator will always be between 0 and the modulus.
The other problem is more subtle. If RAND_MAX doesn't divide evenly into the modulus, you won't get the same probability of each number. In the case of our example, there are 2 ways each to make 0 through 2, but only one way to make 3. So 3 will occur ~14.3% of the time, and each other number will occur ~28.6% of the time. To get a uniform distribution, you need to find a way to deal with cases where RAND_MAX doesn't divide evenly.
Upvotes: 7