Reputation: 275
I have an assignment and the following was mentioned:
Parallelism adds further complications, as the random number generator function must behave correctly when two or more threads call it simultaneously, i.e., it must be thread-safe.
However, I can't understand why that is a problem in the first place. Random generators are usually called with a seed as a parameter and output a number by doing multiple operations on it, I understand that we need each thread to use a different seed but other than that where does the problem come from? I have realized that spawning and calling the random generator in a parallel region instead of a serial one also really worsens performance but I can't understand why this would happen as by the looks of it, the random number generator should run concurrently without any problems since they are no dependencies.
Any help in understand the theory behind this would be appreciated.
Upvotes: 1
Views: 604
Reputation: 50518
Aside from getting wrong values from the race conditions (pointed out by @MitchWheat), the code will be less efficient because of cache-line sharing between cores on mainstream x86 processors.
Here is an example of (pretty bad but simple) 32-bit random generator (written in C):
uint32_t seed = 0xD1263AA2;
uint32_t customRandom() {
uint32_t old = seed;
seed = (uint32_t)(((uint64_t)seed * 0x2E094C40) >> 24);
return old;
}
void generateValues(uint32_t* arr, size_t size) {
for(size_t i=0 ; i<size ; ++i)
arr[i] = customRandom();
}
If you run this example in sequential (see the result here), the state seed
will likely be stored in the memory hierarchy using mainstream compilers (ie. GCC and Clang). This 32-bit block of memory will be read/written in the L1 cache which is very close to the core executing the code.
When you parallelize the loop naively, using for example #pragma omp parallel for
in OpenMP, the state is read/written concurrently by multiple threads. There is a race condition: the state value seed
can be read by multiple thread in parallel and be written in parallel. Consequently, the same value can be generated by multiple threads while the results are supposed to be random. Race condition are bad and must be fixed. You can fix that using a thread-local state here.
Assuming you do not fix the code because you want to understand the impact of the race condition on the resulting performance of this code, you should see a performance drop in parallel. The issue comes from the cache coherence protocol used by x86 mainstream processors. Indeed, seed
is shared between all the threads executed on each core so the processor will try to synchronize the cache of the core so they are kept coherent. This process is very expensive (much slower than reading/writing in the L1 cache). More specifically, when a thread on a given core write in seed
, the processor invalidate the seed
stored in all the other threads located in the caches of the other cores. Each thread must then fetch the updated seed
(typically from the much slower L3 cache) when seed
is read. You can find more information here.
Upvotes: 2