Reputation: 1587
I generate a series of random numbers in parallel, but depending on what number of threads I invoke, I get a different result. From that I conclude that I have made an error somewhere!
NOTE I am using the same seed, which is independent of the number of threads -- so the results should be the same!
Here is the MWE, which generates a number between 0..1 and increments a variable if the generated variable is larger than 0.5:
#include <iostream>
#include <fstream>
#include <vector>
#include <cmath>
#include "omp.h"
#include <random>
typedef std::uniform_real_distribution<double> distr_uni;
#define max_threads 1
using namespace std;
int main(int argc, char* argv[])
{
int reservoir_counter, accepted_tube=0;
double r;
omp_set_num_threads(max_threads);
#pragma omp parallel
{
mt19937 eng(0);
distr_uni uniform(0, 1);
#pragma omp for private(r, reservoir_counter) reduction(+:accepted_tube)
for(reservoir_counter=0; reservoir_counter<100000; reservoir_counter++)
{
r = uniform(eng);
if(r>0.5)
{
accepted_tube++;
}
}
}
cout << accepted_tube << endl;
return 0;
}
When I set max_threads=1
I get 50027, but when max_threads=60
(on a machine that supports it....) I get 50440.
Can someone spot my error that is apparently there? The sensitive RNG and its engine I have declared within the parallelized area, so it's not really clear to me where the error can possibly be.
Upvotes: 1
Views: 273
Reputation: 17946
Random number generators, especially Mersenne Twister (which is what you're using above), are generally very serial creatures. Each call to the random number generator has the side effect of updating the seed to the next internal state. They generate a random number sequence, such that the Nth call to the RNG produces a known value relative to the seed. Thus, your calls to unique(eng)
have side effects, which I believe leads to unspecified behavior in an OMP parallel loop. (A quick, if sloppy, Google search seems to confirm this.)
For example, IBM's documentation uses this phraseology: No synchronization is performed when evaluating these expressions and evaluated side effects may result in indeterminate values.
You should be able to test this hypothesis easily. Write parallel and serial versions of a loop that does nothing more than this:
for (i = 0; i < N; i++)
a[i] = uniform(eng);
This records the exact sequence of values produced by uniform(eng)
. For a sequential RNG in a serial loop, this order is very deterministic. The Nth item always has the same value given the same starting seed. For the parallel version, without additional locking or other synchronization around eng
, I suspect you'll get repeated values. If you add a lock, I suspect you'll get the same set of values, but in an indeterminate order.
If you want to use large numbers of random numbers deterministically in a parallel for
construct, the only safe ways I can think of off the top of my head are to pre-generate those into a large array, or compute a hashing function based on the loop index. Either of those approaches lack the ordering dependence that most typical RNG engines bring.
You could try wrapping the sequential RNG in a lock, but you're still going to visit random numbers in an indeterminate order even if you reliably generate the same sequence from the RNG. That's because the mapping of RNG output sequence to loop iteration numbers can change depending on the order in which the threads come to the RNG.
For this particular example (counting the number of values above 0.5), the order doesn't matter because you're going into a single, huge reduction. You can freely reorder the order in which you add a sequence of numbers. That's why OpenMP wants you to call out reductions. For other computations, order may very well make a difference, including computations with no obvious sequential dependence. For example, consider stochastic dithering. Here's a really simple version. (Assume a uniform distribution over [0,0.5).)
for (i = 0; i < N; i++)
a[i] = b[i] + uniform(eng) > 1 ? 1 : 0;
The exact dither you get depends on the exact mapping of random numbers to loop indices. If you use a serial RNG, then you introduce a serial dependence that the original algorithm lacked, even if you put a lock around the RNG to make it reliable.
Edit: As someone else pointed out above, in this specific case, the RNGs were declared local to the parallel block, so technically, they were not subject to races. I missed that subtlety, but it doesn't change my core point. The basic issue remains: The set of random values in the sequential program did not match the set of random values in the parallel program, because the sequential RNG was not invoked sequentially in the same way between the serial and parallel programs.
If you declare RNGs local to the parallel block so that you get T parallel instances for T threads, then each parallel thread will see the same initial sequence of random numbers. If the threads all iterate an equal number of times (which is likely), then instead of N unique random numbers, you will see N / T random numbers repeated T times.
My core points above the edit still hold. To get the same results for in a parallel and serial version of the same program, the operations in a parallel loop cannot have side effects. Pulling random numbers from a serial RNG like Mersenne Twister inherently has side effects. You either need to generate the random numbers outside the loop, or use some other method that does not have side effects. (Hashing the loop iteration number and Perlin noise are both examples.)
Upvotes: 4