Reputation: 3957
In my application, I have a for-loop running over roughly ten million items, like this:
int main(int argc, char* argv [])
{
unsigned int nNodes = 10000000;
Node** nodeList = new Node* [nNodes];
initialiseNodes(nodeList); // nodes are initialised here
for (unsigned int ii = 0l ii < nNodes; ++ii)
nodeList[ii]->update();
showOutput(nodeList) // show the output in some way
}
I won't go into detail about how the nodes are exactly initialised or shown. What's important is that the Node::update()
method is a small method, independent of the other nodes. Thus, it would be very advantageous to perform this for-loop in parallel. Since it is only a small thing, I wanted to stay away from OpenCL/CUDA/OpenMP this time, so I used the C++ Concurrency::parallel_for
instead. So then the code then looks like this:
#include <ppl.h>
int main(int argc, char* argv [])
{
unsigned int nNodes = 10000000;
Node** nodeList = new Node* [nNodes];
initialiseNodes(nodeList); // nodes are initialised here
Concurrency::parallel_for(unsigned int(0), nNodes, [&](unsigned int ii) {
nodeList[ii]->update();
});
showOutput(nodeList) // show the output in some way
}
This indeed speeds up the programme a little bit, but typically by only 20% or so, I found. Frankly, I expected more. Can someone tell me if this is a typical speed-up factor when using parallel_for
? Or are there ways to get more out of it (without switching to GPU implementations)?
Upvotes: 1
Views: 760
Reputation: 3957
I found what I think contributes most heavily to the performance increase. Surely, like @anthony-burleigh said, the tasks has to be parallelisable and the amount of shared data influences is as well. What I found, however, is that the computational load of the parallelised method matters far more. Big tasks seem to give a higher speed-up than small tasks.
So for example, in:
Concurrency::parallel_for(unsigned int(0), nNodes, [&](unsigned int ii) {
nodeList[ii]->update(); // <-- very small task
});
I only got a speed-up factor of 1.2. However, in a heavy task, like:
Concurrency::parallel_for(unsigned int(0), nNodes, [&](unsigned int ii) {
ray[ii]->recursiveRayTrace(); // <-- very heavy task
});
the programme suddenly ran 3 times as fast.
I am sure that there is a deeper explanation for all this, but this is what I found by trial and error.
Upvotes: 0
Reputation: 352
Throwing more cores at a problem will not always yield an improvement. In fact, in the worst case it can even reduce performance. To benefit from using multiple cores depends on many things, such as the amount of shared data involved. Some problems are inherently parallelizable, and some are not.
Upvotes: 1