Can parallelization have a negative performance impact?

With the abundance of techniques being employed to increase parallelization in today's compiler-tools (especially auto-parallelization of certain viable for-constructs, c.f. the Intel C++ Compiler, Microsoft Visual Studio 2011, alongside various others), I wondered if parallelization is always guaranteed to improve or have no impact on performance.

Are there any cases in which parallelization would have a distinctly negative impact on performance?

A quick internet search didn't yield much hope, so I decided to turn here to see if anyone has any knowledge of cases where parallelization has a detrimental impact on performance, or better yet, experience in a project where parallelization actually caused difficulties.

I am also curious about whether there are any negative performance implication of auto-vectorization, although I find it quite unlikely that there would be.

Thanks in advance!

Upvotes: 1

Answers (4)

Hristo Iliev

Reputation: 74375

Parallelisation usually involves some abstract data exchange between the different processing elements since not all of them have exclusive access to all the data that it needs in order to complete its part of the computation. It could either be messages passed between different processes in an MPI job or it could be synchronisation actions in a multithreaded program. Passing data around or synchronising things takes time and that's why it is usually called communication or synchronisation overhead. There are different classes of problems depending on the ratio between overhead and computation.

Parallel algorithms that require no communication or synchronisation at all are called trivially (or "embarrassingly") parallel problems. An example of this class is a ray-tracing application: each pixel can be computed independently of all the others. Problems in this class scale linearly with the number of processing elements used (and sometimes even superlinearly because of caching effects) - give it twice as many processing elements and it will take twice as less time to perform the computation.

If any amount of communication or synchronisation is involved then things get progressively worse as the ratio between communication/synchronisation and computation increases. Usually this is the case when the problem size is kept fixed as one increases the number of processing elements. Usually the overhead increases with the number of processing elements while the amount of computation per element decreases.

Upvotes: 3

user555045

Reputation: 64904

Auto-vectorization can theoretically fall into "traps" where the overhead of getting all the elements in the right places is actually bigger than the time saved by doing things in parallel. Analyzing how much time a piece of code will take is hard, so it's hard for compilers to make the right decision.

Towards the end of these slides are some examples and statistics about auto-vectorization making the performance worse.

Upvotes: 4

Pedro

Reputation: 1384

From a more theoretical point of view, you may be interested in problems that are not in NC, i.e. the class of decision problems decidable in polylogarithmic time on a parallel computer with a polynomial number of processors.

Off the top of my head, I cannot think of any computational problem that is not, in some way or another, parallelizable. What I have encountered many times though are problems that have been badly parallelized.

Badly parallelized programs can easily be slower than their sequential versions. This can be a result of:

Massive overheads due to the parallelism being too fine-grained, e.g. the amount of work performed per thread is negligible compared to the overhead of starting/scheduling the operation. In OpenMP, this could be the case of a #pragma omp parallel for schedule(dynamic,k) for a small chunk size k.
Repeated concurrent access to shared resources, e.g. if all threads have to wait to access some resource or memory location sequentially. In OpenMP, this can be caused by too many or too large #pragma omp critical sections.
Over-use of slow atomic operations to update variables shared between threads, e.g. using #pragma omp atomic where, in the sequential case, faster regular memory access would be used.

In summary, and in my opinion, there are few inherently sequential problems, but mountains of badly-implemented parallel solutions.

Upvotes: 1

Regfor

Reputation: 8091

Usually with reasonable usage parallelization (mean parallel processing) gives positive performance imact.

But in some cases, from developer point of view, it could cause negative effects:

When allocating to many thread for parallel and/or multithreading processing.
Fork/join parallelism and loops parallelization when iteration is to small and allocating threads costs more time and resources than simple to process items synchronously
Typical multithreading/parallel execution problems like deadlocks, livelocks, threads stravation, race conditions etc.
Debugging and diagnostic, it's harder to find bugs

So all should be used reasonably.

And some links. Sorry they are .NET/Microsoft specific but problems described there are same:

Potential Pitfalls in Data and Task Parallelism

Potential Pitfalls with Parallel LINQ (PLINQ)

Good book where common problems and pitfalls are described: Patterns for Parallel Programming: Understanding and Applying Parallel Patterns with the .NET Framework 4

Upvotes: 1

Can parallelization have a negative performance impact?

Answers (4)

Related Questions