Reputation: 2303
To optimize the execution of some libraries I am making, I have to parallelize some calculations. Unfortunately, I can not use openmp for that, so I am trying to do some similar alternative using boost::thread. Anyone knows of some implementation like this? I have special problems with the sharing of variables between threads (to define variables as 'shared' and 'pribate' of openmp). Any sugestions?
Upvotes: 1
Views: 723
Reputation: 4275
As far as I know you'll have to do that explicitly with anything other than OpenMP.
As an example if we have a parallelized loop in OpenMP
int i;
size_t length = 10000;
int someArray[] = new int[length];
#pragma omp parallel private(i)
{
#pragma omp for schedule(dynamic, 8)
for (i = 0; i < length; ++i) {
someArray[i] = i*i;
}
}
You'll have to factor out the logic into a "generic" loop that can work on a sub-range of your problem, and then explicitly schedule the threads. Each thread will then work on a chunk of the whole problem. In that way you explicitly declare the "private" variables- the ones that go into the subProblem function.
void subProblem(int* someArray, size_t startIndex, size_t subLength) {
size_t end = startIndex+subLength;
for (size_t i = startIndex; i < end; ++i) {
someArray[i] = i*i;
}
}
void algorithm() {
size_t i;
size_t length = 10000;
int someArray[] = new int[length];
int numThreads = 4; // how to subdivide
int thread = 0;
// a vector of all threads working on the problem
std::vector<boost::thread> threadVector;
for(thread = 0; thread < numThreads; ++thread) {
// size of subproblem
size_t subLength = length / numThreads;
size_t startIndex = subLength*thread;
// use move semantics to create a thread in the vector
// requires c++11. If you can't use c++11,
// perhaps look at boost::move?
threadVector.emplace(boost::bind(subProblem, someArray, startIndex, subLength));
}
// threads are now working on subproblems
// now go through the thread vector and join with the threads.
// left as an exercise :P
}
The above is one of many scheduling algorithms- it just cuts the problem into as many chunks as you have threads.
The OpenMP way is more complicated- it cuts the problem into many small sized chunks (of 8 in my example), and then uses work-stealing scheduling to give these chunks to threads in a thread pool. The difficulty of implementing the OpenMP way, is that you need "persistent" threads that wait for work ( a thread pool ). Hope this makes sense.
An even simpler way would be to do async on every iteration (scheduling a piece of work for each iteration). This can work, if the each iteration is very expensive and takes a long time. However, if it's small pieces of work with MANY iterations, most of the overhead will go into the scheduling and thread creation, rendering the parallelization useless.
In conclusion, depending on your problem, there are be many ways to schedule the work, it's up to you to find out what works best for your problem.
TL;DR: Try Intel Threading Building Blocks (or Microsoft PPL) which schedule for you, provided you give the "sub-range" function:
http://cache-www.intel.com/cd/00/00/30/11/301132_301132.pdf#page=14
Upvotes: 4