Reputation: 37834
I've often heard that you should never mod the result of your random number generator if you want a uniform distribution. However, I've seen that using a std::uniform_int_distribution
makes no difference for significantly small ranges.
Below is an example using both mod and uniform_int_distribution
for values 0 - 15:
std::mt19937 gen;
gen.seed(0);
int ROWS = 6;
int COLS = 10;
std::cout << "mod: \n";
for (size_t i = 0; i < ROWS; ++i){
for (size_t j = 0; j < COLS; ++j){
std::cout << std::setw(2) << gen() % 16 << " ";
}
std::cout << "\n";
}
std::cout << "\n";
gen.seed(0);
std::uniform_int_distribution<> distrib(0, 15);
std::cout << "dist: \n";
for (size_t i = 0; i < ROWS; ++i){
for (size_t j = 0; j < COLS; ++j){
std::cout << std::setw(2) << distrib(gen) << " ";
}
std::cout << "\n";
}
results:
mod:
12 15 5 0 3 11 3 7 9 3
5 2 4 7 6 8 8 12 10 1
6 7 7 14 8 1 5 9 13 8
9 4 3 0 3 5 14 15 15 0
2 3 8 1 3 13 3 3 14 7
0 1 9 9 15 0 15 10 4 7
dist:
12 15 5 0 3 11 3 7 9 3
5 2 4 7 6 8 8 12 10 1
6 7 7 14 8 1 5 9 13 8
9 4 3 0 3 5 14 15 15 0
2 3 8 1 3 13 3 3 14 7
0 1 9 9 15 0 15 10 4 7
I guess it has something to do with 2 bytes? I'm just wondering how this is valid mathematically since its stepping through the random number generator and modding results. Does this mean mod creates a uniform distribution if the range is small enough? And why a 2 byte range and not more?
Upvotes: 0
Views: 97
Reputation: 32732
Using the modulo operator will frequently introduce a bias into the returned results when the number of unique values returned by your source of random bits is not a multiple of the divisor.
As a simple example, if your random source returns 4 bits (0-15) and you want values in the range 0-2, using gen() % N
you'll get 6 0
s, 5 1
s, and 5 2
s. This biases your results to the low side.
Using multiply-then-divide (gen() * N / RANGE
) can still leave an imbalance in the specific number of each result returned, but the imbalance will be spread out evenly among the results which reduces or eliminates the low bias. It also has to contend with overflow in the multiplication. With the previous example, you'll get 5 0
s, 6 1
s, and 5 0
s.
A third alternative would be to check the returned bits to see if the value is among the highest result (that would result in the bias) and regenerate the random bits if this is the case. This introduces a conditional in the code and the time to generate a random number is open ended (rather than fixed).
Upvotes: 2