Liberus
Liberus

Reputation: 158

Can gcc make my code parallel?

I was wondering if there is an optimization in gcc that can make some single-threaded code like the example below execute in parallel. If no, why? If yes, what kind of optimizations are possible?

#include <iostream>

int main(int argc, char *argv[])
{
    int array[10];
    for(int i = 0; i < 10; ++ i){
        array[i] = 0;
    }
    for(int i = 0; i < 10; ++ i){
        array[i] += 2;
    }
    return 0;
}

Added:

Thanks for OpenMP links, and as much as I think it's useful, my question is related to compiling same code without the need to rewrite smth. So basically I want to know if:

  1. Making code parallel(at least in some cases) without rewriting it is possible?
  2. If yes, what cases can be handled? If not, why?

Upvotes: 6

Views: 11835

Answers (4)

Peter Cordes
Peter Cordes

Reputation: 364428

Yes, gcc with -ftree-parallelize-loops=4 will attempt to auto-parallelize with 4 threads, for example.

I don't know how well gcc does at auto-parallelization, but it is something that compiler developers have been working on for years. As other answers point out, giving the compiler some guidance with OpenMP pragmas can give better results. (e.g. by letting the compiler know that it doesn't matter what order something happens in, even when that may slightly change the result, which is common for floating point. Floating point math is not associative.)

And also, only doing auto-parallelization for #pragma omp loops means only the really important loops get this treatment. -ftree-parallelize-loops probably benefits from PGO (profile-guided optimization) to know which loops are actually hot and worth parallelizing and/or vectorizing.

It's somewhat related to finding the kind of parallelism that SIMD can take advantage of, for auto-vectorizing loops. (Which is enabled by default at -O3 in gcc, and at -O2 in clang).

Upvotes: 11

matthieusb
matthieusb

Reputation: 505

If you want to parallelize your c++ code, you can use openmp. Official documentation can be found here : openmp doc

Openmp provides pragmas so that you can indicate to the compiler that a portion of code has to use a certain number of threads. Sometimes you can do it manually, and some other pragmas can automatically optimize the number of cores used.

The code below is an example of the official documentation :

#include <cmath>
int main() {
    const int size = 256;
    double sinTable[size];

    #pragma omp parallel for
    for(int n=0; n<size; ++n) {
        sinTable[n] = std::sin(2 * M_PI * n / size);
    }
}

This code will automatically parallelize the for loop, this answers your question. There are a lot of other possibilities offered by openmp, you can read the documentation to learn more.

If you need to understand compiling for openmp support, see this stack overflow thread : openmp compilation thread.

Be careful, If you don't use openmp specific options, pragmas will simply be ignored and your code will be run on 1 thread.

I hope this helps.

Upvotes: 4

Dietmar K&#252;hl
Dietmar K&#252;hl

Reputation: 153840

Compilers are allowed to do whatever they want as long as the observable behavior (see 1.9 [intro.execution] paragraph 8) is identical to that specified by the [correct(*)] program. Observable behavior is specified in terms of I/O operations (using standard C++ library I/O) and access to volatile objects (although the compiler actually isn't really required to treat volatile objects special if it can prove that these aren't in observable memory). To this end the C++ execution system may employ parallel techniques.

Your example program actually has no observable outcome and compilers are good a constant folding programs to find out that the program actually does nothing. At best, the heat radiated from the CPU could be an indication of work but the amount of energy consumed isn't one of the observable effects, i.e., the C++ execution system isn't required to do that. If you compile the code above with clang with optimization turned on (-O2 or higher) it will actually entirely remove the loops (use the -S option to have the compiler emit assembly code to reasonably easy inspect the results).

Assuming you have actually loops which are forced to be executed, most contemporary compilers (at least, gcc, clang, and icc) will try to vectorize the code taking advantage of SIMD instructions. To do so, the compiler needs to comprehend the operations in the code to prove that parallel execution doesn't change the results or introduced data races (as far as I can tell, the exact results are actually not necessarily retained when floating point operations are involved as some of the compilers happily parallelize, e.g., loops adding floats although floating point addition isn't associative).

I'm not aware of a contemporary compiler which will utilize different threads of execution to improve the speed of execution without some form of hints like Open MP's pragmas. However, discussion at the committee meetings imply that compiler vendors are considering to so at least.

(*) The C++ standard imposes no restriction on the C++ execution system in case the program execution results in undefined behavior. Correct programs wouldn't invoke any form of undefined behavior.

tl;dr: compilers are allowed but not required to execute code in parallel and most contemporary compilers do so in some situations.

Upvotes: 7

RobClucas
RobClucas

Reputation: 855

The compiler can try to automatically parallelise your code, but it wont do it by creating threads. It may use vectorised instructions (intel intrinsics for an intel CPU, for example) to operate on multiple elements at a time, where it can detect that using those instructions is possible (for example when you perform the same operation multiple times on consecutive elements of a correctly aligned data structure). You can help the compiler by telling it which intrinsic instruction set your CPU supports (-mavx, -msse4.2 ... for example).

You can also use these instructions directly, but it requires a non-trivial amount of work for the programmer. There are also libraries which do this already (see the vector class here Agner Fog's blog).

You can get the compiler to auto-parallelise using multiple threads by using OpenMP (OpenMP introducion), which is more instructing the compiler to auto-parallelise, than the compiler auto-parallelising by itself.

Upvotes: 12

Related Questions