Reputation: 20143

How to properly choose rng seed for parallel processes

I'm currently working on a C/C++ project where I'm using a random number generator (gsl or boost). The whole idea can be simplified to a non-trivial stochastic process which receives a seed and returns results. I'm computing averages over different realisations of the process.

So, the seed is important: the processes must be with different seeds or it will bias the averages.

So far, I'm using time(NULL) to give a seed. However, if two processes start at the same second, the seed is the same. That happens because I'm using parallelisation (using openMP).

So, my question is: how to implement a "seed giver" on C/C++ which gives independent seeds?

For instance, I though in using the thread number (thread_num), seed = time(NULL)*thread_num. However, this means that the seeds are correlated: they are multiple of each others. Does that poses any problem to the "pseudo-random" or is it as good as sequential seeds?

The requirements are that it must work on both Mac OS (my pc) and Linux distribution similar to OS Cent (the cluster) (and naturally give independent realisations).

Upvotes: 9

Answers (8)

swann

Reputation: 241

The way I understand your question, you have multiple processes using the same pseudo-random number generation algorithm, and you want each "stream" of random numbers (in each process) to be independent of each other. Am I correct ?

In that case, you are right in suspecting that giving different (correlated) seeds does not guaranty you anything unless the rng algorithm says so. You basically have two solutions:

Simple version

Use a single source of random numbers, with a single seed. Then feed random numbers in a round-robin fashion to each process.

This solution is slow but provide some guaranty that the number you give to your processes are ok.

You can do the same thing but generating all the random numbers you need at once, and then splitting this set into as many slices as you have processes.

Use a RNG designed for that

You can find in papers and on the web several algorithms specifically designed to provide independent streams of random numbers from a single initial state. They are complicated but most provide source code. The idea is generally to "split" the RNG space (values you can obtain from the initial state) into various chunks like above. They are just faster because the algorithm used makes it possible to compute easily what would be the state of the RNG if you skipped a given number of values.

These generators are generally called "parallel random number generators". The most popular ones are probably these two:

RngStreams: http://statmath.wu.ac.at/software/RngStreams/
SPRNG: http://sprng.cs.fsu.edu/

Check their manuals to fully understand what they do, how they do it, and if it really is what you need.

Upvotes: 1

NoSenseEtAl

Reputation: 30058

Maybe you could try std::chrono high resolution clock from C++11:

Class std::chrono::high_resolution_clock represents the clock with the smallest tick period available on the system. It may be an alias of std::chrono::system_clock or std::chrono::steady_clock, or a third, independent clock.

http://en.cppreference.com/w/cpp/chrono/high_resolution_clock

BUT tbh Im not sure that there is anything wrong with srand(0); srand(1), srand(2).... but my knowledge of rand is very very basic. :/

For crazy safety consider this:

Note that all pseudo-random number generators described below are CopyConstructible and Assignable. Copying or assigning a generator will copy all its internal state, so the original and the copy will generate the identical sequence of random numbers.

http://www.boost.org/doc/libs/1_51_0/doc/html/boost_random/reference.html#boost_random.reference.generators

Since most of the generators have crazy long cycles you could generate one, copy it as first generator, generate X numbers with original, copy it as second, generate X numbers with original, copy it as third... If your users call their own generator less than X time they will not be overlapping.

Upvotes: 1

François Févotte

Reputation: 20268

A commonly used scheme for this is to have a "master" RNG used to generate seeds for each process-specific RNG.

The advantage of such a scheme is that the whole computation is determined by only one seed, which you can record somewhere to be able to replay any simulation (this might be useful to debug nasty bugs).

Upvotes: 9

João Lopes

Reputation: 11

When compare two infinite time sequences produced by the same pseudo-random number generator with different seeds, we can see that they are same delayed by some time tau. Usually this time time scale is much bigger than your problem to ensure that the two random walks are uncorrelated.

If your stochastic process is in a high dimensional phase space, I think that one good suggestion could be:

seed = MAXIMUM_INTEGER/NUMBER_OF_PARALLEL_RW*thread_num + time(NULL)

Notice that using scheme you are not guaranteeing that time tau is big !!

If you have some knowledge of your system time scale, you can call your random number generator some number o times in order to generate seeds that are equidistant by some time interval.

Upvotes: 1

amdn

Reputation: 11582

If you are on x86 and don't mind making the code non-portable then you could read the Time Stamp Counter (TSC) which is a 64-bit counter that increments at the CPU (max) clock rate (about 3 GHz) and use that as a seed.

#include <stdint.h>
static inline uint64_t rdtsc()
{
    uint64_t tsc;
    asm volatile
    ( 
        "rdtsc\n\t"
        "shl\t$32,%%rdx\n\t"       // rdx = TSC[ 63 : 32 ] : 0x00000000
        "add\t%%rdx,%%rax\n\t"     // rax = TSC[ 63 :  0 ]
        : "=a" (tsc) : : "%rdx"
    );
    return tsc;
}

Upvotes: 2

Benjamin Bannier

Reputation: 58684

When faced with this problem I often use seed_rng from Boost.Uuid. It uses time, clock and random data from /dev/urandom to calculate a seed. You can use it like

#include <boost/uuid/seed_rng.hpp>
#include <iostream>

int main() {
  int seed = boost::uuids::detail::seed_rng()();
  std::cout << seed << std::endl;
}

Note that seed_rng comes from a detail namespace, so it can go away without further notice. In that case writing your own implementation based on seed_rng shouldn't be too hard.

Upvotes: 3

James Kanze

Reputation: 153977

Mac OS is Unix too, so it probably has /dev/random. If so, that's the best solution for obtaining the seeds. Otherwise, if the generator is good, taking time( NULL ) once, and then incrementing it for the seed of each generator, should give reasonably good results.

Upvotes: 1

Gearoid Murphy

Reputation: 12116

We ran into a similar problem on a Beowulf computing grid, the solution we used was to incorporate the pid of the process into the RNG seed, like so:

time(NULL)*thread_num*getpid()

Of course, you could just read from /dev/urandom or /dev/random into an integer.

Upvotes: 5

How to properly choose rng seed for parallel processes

Answers (8)

Simple version

Use a RNG designed for that

Related Questions