Reputation: 39
I'm studying OpenMP now, and I have a question. The work time of the following code and the same code without a parallel section is statistically equal, though all threads are accessing the function. I tried to look at some guides in the internet, but it did not help. So the question is, what is wrong with this parallel section?
int sumArrayParallel( )
{
int i = 0;
int sum = 0;
#pragma omp parallel for
for (i = 0; i < arraySize; ++i)
{
cout << omp_get_thread_num() << " ";
sum += testArray[i];
}
return sum;
}
Upvotes: 0
Views: 747
Reputation: 78316
There are two very common causes of OpenMP codes failing to exhibit improved performance over their serial counterparts:
The work being done is not sufficient to outweigh the overhead of parallel computation. Think of there being a cost, in time, for setting up a team of threads, for distributing work to them, for gathering results from them. Unless this cost is less than the time saved by parallelising the computation an OpenMP code, even if correct, will not show any speed up and may show the opposite. You haven't shown us the numbers so do the calculations on this yourself.
The programmer imposes serial operation on the parallel program, perhaps by wrapping data access inside memory fences, perhaps by accessing platform resources which are inherently serial. I suspect (but my knowledge of C is lousy) that your writing to cout
may inadvertently serialise that part of your computation.
Of course, you can have a mixture of these two problems, too much serialisation and not enough work, resulting in disappointing performance.
For further reading this page on Intel's website is useful, and not just for beginners.
I think, though, that you have a more serious problem with your code than its poor parallel performance. Does the OpenMP version produce the correct sum
? Since you have made no specific provision sum
is shared by all threads and they will race for access to it. While learning OpenMP it is a very good idea to attach the clause default(none)
to your parallel regions and to take responsibility for defining the shared/private status of each variable in each region. Then, once you are fluent in OpenMP you will know why it makes sense to continue to use the default(none)
clause.
Even if you reply Yes, the code does produce the correct result the data race exists and your program can't be trusted. Data races are funny like that, they don't show up in all the tests you run then, once you roll-out your code into production, bang ! and egg all over your face.
However, you seem to be rolling your own reduction
and OpenMP provides the tools for doing this. Investigate the reduction
clause in your OpenMP references. If I read your code correctly, and taking into account the advice above, you could rewrite the loop to
#pragma omp parallel for default(none) shared(sum, arraySize, testArray) private(i) reduction(+:sum)
for (i = 0; i < arraySize; ++i)
{
sum += testArray[i];
}
In a nutshell, using the reduction clause tells OpenMP to sort out the problems of summing a single value from work distributed across threads, avoiding race conditions etc.
Since OpenMP makes loop iteration variables private by default you could omit the clause private(i)
from the directive without too much risk. Even better though might be to declare it inside the for statement:
#pragma omp parallel for default(none) shared(sum, arraySize, testArray) reduction(+:sum)
for (int i = 0; i < arraySize; ++i)
variables declared inside parallel regions are (leaving aside some special cases) always private.
Upvotes: 4