Reputation: 21
I am implementing a Monte Carlo simulation, where I need to run multiple realisations of some dynamics and then take an average over the end state for all the simulations. Since the number of realisation is large, I run them in parallel using OpenMP. Every realisation starts from the same initial conditions and then at each time step a process happens with a given probability, and to determine which process I draw a random number from a uniform distribution.
I want to make sure that all simulations are statistically independent and that there is no overlap in the random numbers that are being drawn.
I use OpenMP to parallelise the for loops, so the skeleton code looks like this:
vector<int> data(number_of_sims);
double t;
double r;
#pragma omp parallel for
for(int i = 0; i < number_of_sims; i++){
// run sim
t = 0;
while (t < T) {
r = draw_random_uniform();
if (r < p) do_something();
else do_something_else();
t += 1.0; // increment time
}
// some calculation
data[i] = calculate();
}
So every time I want a random number, I would call a function which used the Mersenne Twister seeded with random device.
double draw_random_uniform(){
static thread_local auto seed = std::random_device{}();
static thread_local mt19937 mt(seed);
std::uniform_real_distribution<double> distribution(0.0, 1.0);
double r = distribution(mt);
return r;
}
However, since I ultimately want to run this code on a high power computing cluster I want to avoid using std::random_device()
as it is risky for systems with little entropy.
So instead I want to create an initial random number generator and then jump it forward a large amount for each of the threads. I have been attempting to do this with the Xoroshiro256+ PRNG (I found some good implementation here: https://github.com/Reputeless/Xoshiro-cpp). Something like this for example:
XoshiroCpp::Xoshiro256Plus prng(42); // properly seeded prng
#pragma omp parallel num_threads()
{
static thread_local XoshiroCpp::Xoshiro256Plus lprng(prng); // thread local copy
lprng.longJump(); // jump ahead
// code as before, except use lprng to generate random numbers
# pragma omp for
....
}
However, I cannot get such an implementation to work. I suspect because of the double OpenMP for loops. I had the thought of pre-generating all of the PNRGs and storing in a container, then accessing the relevant one by using omp_get_thread_num()
inside the parallelised for loop.
I am unsure if this is the best way to go about doing all this. Any advice is appreciated.
Upvotes: 2
Views: 400
Reputation: 129
For large-scale parallel Monte Carlo simulations Collatz-Weyl generators would be perfect:
https://arxiv.org/abs/2312.17043
This is two line PRNG named CWG128 (there is also a smaller 64-bit generator):
static __uint128_t c[4]; // c[0] must be odd
__uint128_t CWG128(void){
c[1] = (c[1] >> 1) * ((c[2] += c[1]) | 1) ^ (c[3] += c[0]);
return c[2] >> 96 ^ c[1];
}
There is no risk of overlapping because each c[0] (must be odd) initializes the generator with a separate, independent cycle. For full independence of the streams from the first generated number (generators initialized with similar numbers, e.g. 1, 3, 5, ... may exhibit small correlations in the initial values), one of two initialization methods should be used.
In the first method you can initialize the generators with consecutive odd numbers (specifically c[0], the remaining variables can be equal to 0), but then you have to skip first 96 outputs. You can use for this purpose that function:
void CWG128_initializer(void){
for (int i = 0; i < 96; i++){
c[1] = (c[1] >> 1) * ((c[2] += c[1]) | 1) ^ (c[3] += c[0]);
}
}
In the second method you can initalize your generators by:
c[1] = splitmix64();
c[0] = ((__uint128_t)splitmix64() << 64) | ((splitmix63() << 1) | 1);
And here is splitmix64() and splitmix63():
static uint64_t y; /* The state can be seeded with any value. */
uint64_t splitmix64() {
uint64_t z = (y += 0x9e3779b97f4a7c15);
z = (z ^ (z >> 30)) * 0xbf58476d1ce4e5b9;
z = (z ^ (z >> 27)) * 0x94d049bb133111eb;
return z ^ (z >> 31);}
static uint64_t y; /* The state can be seeded with any value. */
uint64_t splitmix63() {
uint64_t z = (y += 0x9e3779b97f4a7c15) & 0x7fffffffffffffff;
z = ((z ^ (z >> 30)) * 0xbf58476d1ce4e5b9) & 0x7fffffffffffffff;
z = ((z ^ (z >> 27)) * 0x94d049bb133111eb) & 0x7fffffffffffffff;
return z ^ (z >> 31);}
You set y to any value, and then use:
c[1] = splitmix64();
c[0] = ((__uint128_t)splitmix64() << 64) | ((splitmix63() << 1) | 1);
to serially initialize subsequent generators. You can initialize 2^63 independent, non-overlapping streams by splimixes method (splitmixes cannot generate more numbers) or 2^127 indepenedent, non-overlapping streams by initializing c[0] by 1, 3, 5, ... (and c[1] = 0, c[2] = 0, c[3] = 0) and skipping first 96 outputs of the streams for full independence from the first number.
Upvotes: 0
Reputation: 2869
You should read "Parallel Random Numbers, as easy as one, two three" http://www.thesalmons.org/john/random123/papers/random123sc11.pdf This paper explicitly addresses your forward stepping issues. You can now find implementations of this generator in maths libraries (such as Intel's MKL, which uses the specialized encryption instructions, so will be hard to beat by hand!)
Upvotes: 0
Reputation: 2860
Coordinating random number generators with long jump can be tricky. Alternatively there is a much simpler method.
Here is a quote from the authors website:
It is however important that the period is long enough. Moreover, if you run
n
independent computations starting at random seeds, the sequences used by each computation should not overlap.Now, given a generator with period
P
, the probability that
subsequences of lengthL
starting at random points in the state space overlap is bounded byn² L/P
. If your generator has period2^256
and you run on2^64
cores (you will never have them) a computation using2^64
pseudorandom numbers (you will never have the time) the probability of overlap would be less than2^-64
.
So instead of trying to coordinate, you could in each thread just randomly seed a new generator from std::random_device{}
. The period is so large that it will not collide.
While this sounds like a very add-hock approach, this random-seeding method
is actually a widely used and classic method.
You just need to make sure the seeds are different. Depending on the platform usually different random seeds are proposed.
If repeatability is not needed, seeds from a random source is the most easiest and safest solution.
The paper from L'Ecuyer et. al. from 2017 gives a good overview of methods for generating parallel streams. He calls this approach "RNG with a “random” seed for each stream` under chapter 4.
vector<int> data(number_of_sims);
double t;
double r;
#pragma omp parallel for
for(int i = 0; i < number_of_sims; i++){
// random 128 bit seed
auto rd = std::random_device{};
auto seed = std::seed_seq {rd(), rd(), rd(), rd()};
auto mt = std::mt19937 {seed};
// run sim
t = 0;
while (t < T) {
r = draw_random_uniform(mt);
if (r < p) do_something();
else do_something_else();
t += 1.0; // increment time
}
// some calculation
data[i] = calculate();
}
and
double draw_random_uniform(mt19937 &mt){
std::uniform_real_distribution<double> distribution(0.0, 1.0);
return distribution(mt);
}
If number_of_sims
is not extremely large there is no need for static or thread_local initialization.
Upvotes: 2