Reputation: 436
I tried a small experiment with C++ random number generator code. I will post the code for everyone to see.
unsigned int array[] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
unsigned int rand_seed = 4567;
int loop = 0;
srandom(rand_seed);
while (loop < 2147483647)
{
array[random() % 10]++;
loop++;
}
for (int i = 0; i < 10; i++)
{
cout << array[i] << endl;
}
It's a simple code, not much to explain here. I learned that modulo operation causes a small bais, in this case the occurrence of 0 should be higher than other values since, 0 itself is counted and whenever 10 occurs. But when I display the contents of my array
, the values are almost the same for all number between 0 and 9 (inclusive).
Can anyone let me know that this bias thing actually is correct or not? If yes that modulo operation does introduce bias, why can't I see it?
In math terms, can I say that my random variable X can have definite values between 0 and 9 (inclusive) and by ploting the frequency values (essentially array
values), the resultant graph is a probability density function.
Just to make the question complete here is the result what I get in my array
.
214765115
214745521
214749449
214749304
214747088
214733986
214745858
214743477
214760340
214743509
Upvotes: 2
Views: 2836
Reputation: 29724
It's a simple code, not much to explain here. I learned that modulo operation causes a small bais, in this case the occurrence of 0 should be higher than other values since, 0 itself is counted and whenever 10 occurs.
not only 10, but every other number will wrap to something between [0,9] too, because modulo is done with 10 as divisor. So there is a mapping here from values returned by random()
(i.e. let's assume [0,255], POSIX random() has wider range but the idea is important) to domain [0,9]. This introduces bias.
In math terms, can I say that my random variable X can have definite values between 0 and 9 (inclusive) and by ploting the frequency values (essentially array values), the resultant graph is a probability density function.
Definitely this is a distribution, however this is not uniform on range [0,9] but skewed to the left. In our example there are n=256 possibilities, and here is a probability density function
x f(x)
0 26/256
1 26/256
2 26/256
3 26/256
4 26/256
5 26/256
6 25/256
7 25/256
8 25/256
9 25/256
sum 1
Upvotes: 2
Reputation: 308206
The bias will be larger as the value of the modulo is increased, and smaller as the maximum random number is increase. In this case 10 is very small compared to the largest random number, so the bias will be almost immeasurable.
If you want to see a better example, use fewer of the bits returned for your random numbers.
int random_value = random() & 0xfff;
array[random_value % 10]++;
Upvotes: 3
Reputation: 217275
For the example, suppose that random
returns a unsigned char
so value between [0; 255]
Now if we use modulo % 10
, we will have a little more 0, 1, 2, 3, 4, 5
because of [250; 255]
.
Upvotes: 2