Ofey
Ofey

Reputation: 123

Why OpenMP with random number generation is slower than serial code

I'm trying use OpenMP to add parallelism to my program.

std::random_device rd;
std::mt19937 generator(rd());
std::uniform_real_distribution<float> distribution(-0.5, 0.5);

#pragma omp parallel for
for(int i = 0; i < 100000000; i++)
{
    float x = distribution(generator);
}

I tested the code on windows (Visual Studio 2010) and linux (Centos 6.5, gcc 4.9.1) on 12 core processors, and found the parallel version was slower than the serial code.

The results as below:

g++ test.cpp -o test -std=c++11 -Ofast
time ./test
real    0m1.234s
user    0m1.229s
sys 0m0.004s

g++ test.cpp -o test -fopenmp -std=c++11 -Ofast
time ./test
real    0m1.708s
user    0m24.218s
sys 0m0.010s

Why is the the OpenMP is slower than the serial code?

Upvotes: 0

Views: 1738

Answers (1)

rubenvb
rubenvb

Reputation: 76529

You're using one random number generator across many threads. Each call for a new random number will block all the other parallel calls until it is finished.

If you would profile the code, chances are all (or most) of the execution time is spent in some form of mutex locking/unlocking. This problem is called contention and your scenario would be a schoolbook example of how to cause it.

If you use std::threads and give each thread a separate rng, you will achieve pretty much 100% parallelization for that part of the code.

Some code to get you started using std::thread below. Note the use of std::ref

#include <array>
  using std::array;
#include <cstddef>
  using std::size_t;
#include <functional>
  using std::ref;
#include <iostream>
  using std::cout;
#include <iterator>
  using std::iterator_traits;
#include <thread>
  using std::thread;
#include <vector>
  using std::vector;
#include <random>
  using mersenne_twister = std::mt19937;

template<class T, T N>
array<T, N> series_of_numbers()
{
  array<T, N> arr;
  for(T i=0; i<N; ++i)
    arr[i] = i;

  return arr;
}

template<class Iterator, class Engine>
void generate_rng(Iterator begin, Iterator end, Engine& engine)
{
  std::uniform_real_distribution<double> dist;
  for(auto it = begin; it != end; ++it)
    *it = dist(engine);
}

int main()
{
  const size_t amount_of_random_numbers = 1024;
  // Engines
  const size_t Nrng = 4;
  auto seed_values = series_of_numbers<size_t, Nrng>(); // choose other seeds if you wish
  array<mersenne_twister, Nrng> engines;
  for(size_t i=0; i<Nrng; ++i)
    engines[i].seed(seed_values[i]);

  vector<thread> threads;
  vector<double> rngs(amount_of_random_numbers);

  // relevant iterators with offsets
  vector<vector<double>::iterator> begins = { rngs.begin(),
                                              rngs.begin() + amount_of_random_numbers/Nrng,
                                              rngs.begin() + 2*amount_of_random_numbers/Nrng,
                                              rngs.begin() + 3*amount_of_random_numbers/Nrng };

  vector<vector<double>::iterator> ends = { rngs.end(),
                                              rngs.end() - 3*amount_of_random_numbers/Nrng,
                                              rngs.end() - 2*amount_of_random_numbers/Nrng,
                                              rngs.end() - amount_of_random_numbers/Nrng };
  // create threads
  for(size_t n=0; n<Nrng; ++n)
    threads.emplace_back(thread(generate_rng<decltype(begins[n]), mersenne_twister>, ref(begins[n]), ref(ends[n]), ref(engines[n])));

  // join threads -> this is where the work will be done.
  for(size_t n=0; n<Nrng; ++n)
    threads[n].join();

  // rngs is filled with magical values!
  for(auto number : rngs)
    std::cout << number << '\n';
}

Live demo at Coliru. And another version where you can actually change the number of threads to any multiple of 4

Upvotes: 3

Related Questions