Reputation: 932
Performance wise, which of the following is more efficient?
Assigning in the master thread and copying the value to all threads:
int i = 0;
#pragma omp parallel for firstprivate(i)
for( ; i < n; i++){
...
}
Declaring and assigning the variable in each thread
#pragma omp parallel for
for(int i = 0; i < n; i++){
...
}
Declaring the variable in the master thread but assigning it in each thread.
int i;
#pragma omp parallel for private(i)
for(i = 0; i < n; i++){
...
}
It may seem a silly question and/or the performance impact may be negligible. But I'm parallelizing a loop that does a small amount of computation and is called a large number of times, so any optimization I can squeeze out of this loop is helpful.
I'm looking for a more low level explanation and how OpenMP handles this.
For example, if parallelizing for a large number of threads I assume the second implementation would be more efficient, since initializing a variable using xor
is far more efficient than copying the variable to all the threads
Upvotes: 3
Views: 1232
Reputation: 51523
There is not much of a difference in terms of performance among the 3 versions you presented, since each one of them is using #pragma omp parallel for
. Hence, OpenMP will automatically assign each for iteration to different threads. Thus, variable i
will became private to each thread, and each thread will have a different range of for iterations to work with. The variable 'i'
was automatically set to private in order to avoid race conditions when updating this variable. Since, the variable 'i'
will be private on the parallel for anyway, there is no need to put private(i) on the #pragma omp parallel for
.
Nevertheless, your first version will produce an error since OpenMP is expecting that the loop right underneath of #pragma omp parallel for
have the following format:
for(init-expr; test-expr;incr-expr)
inorder to precompute the range of work.
The for directive places restrictions on the structure of all associated for-loops. Specifically, all associated for-loops must have the following canonical form:
for (init-expr; test-expr;incr-expr) structured-block (OpenMP Application Program Interface pag. 39/40.)
Edit: I tested your two last versions, and inspected the generated assembly. Both version produce the same assembly, as you can see -> version 2 and version 3.
Upvotes: 4