Reputation: 13
I'm learning OpenMP and faced a weird (to me) issue with the collapse clause. My actual code is a lot longer than this, but I was able to reproduce my issue using this short version:
#include <stdio.h>
#include <stdlib.h>
int main()
{
size_t nrows = 10, ncols = 10;
unsigned int *cells;
int row, col;
cells = calloc(nrows * ncols, sizeof *cells);
#pragma omp parallel for
for (row = 0; row < nrows; row++) {
for (col = 0; col < ncols; col++)
if (row * col % 10)
cells[row * nrows + col] = 1;
}
for (row = 0; row < nrows; row++) {
for (col = 0; col < ncols; col++)
printf("%d ", cells[row * nrows + col]);
printf("\n");
}
}
I get the expected output using one thread OMP_NUM_THREADS=1 ./test
:
0 0 0 0 0 0 0 0 0 0
0 1 1 1 1 1 1 1 1 1
0 1 1 1 1 0 1 1 1 1
0 1 1 1 1 1 1 1 1 1
0 1 1 1 1 0 1 1 1 1
0 1 0 1 0 1 0 1 0 1
0 1 1 1 1 0 1 1 1 1
0 1 1 1 1 1 1 1 1 1
0 1 1 1 1 0 1 1 1 1
0 1 1 1 1 1 1 1 1 1
and its md5sum is 94ae20845c84c865dbea94918ac5f06e
. Now, if I run it using more than one thread many times, it sometimes generates different results.
$ for i in `seq 1 100`; do OMP_NUM_THREADS=2 ./test | md5sum; done | sort -u
94ae20845c84c865dbea94918ac5f06e *-
b1123bcfe82797548237998874cd0fd5 *-
What's more interesting? If I add collapse(2)
to parallel for
, I get the expected output consistently.
I also tried:
#pragma omp parallel for
for (row = 0; row < nrows; row++) {
for (col = 0; col < ncols; col++)
if (row * col % 10)
#pragma omp atomic write
cells[row * nrows + col] = 1;
#pragma omp flush
}
but it didn't help.
My only uneducated theory is that without collapse(2)
, after each col
for loop, a second thread somehow overwrites the result of its previous thread with its initial 0 cell values, which it never touched because each thread updates its own portion of the cells
array (row * nrows + col
is unique). Is it false sharing because multiple threads try to access cells close to each other? Then, why does it only happen without collapse(2)
and is it still safe with collapse(2)
?
FYI, I used MinGW GCC from MSYS2.
The reason why I want to parallelize the outer loop only is the cost of collapsing with dynamic scheduling. Any help would be appreciated!
Upvotes: 0
Views: 105
Reputation: 2818
TL;DR answer: You have a data race on variable col
, which does not occur if you use the collapse(2)
clause because it privatizes both loop variables.
Details: You have to examine the sharing attributes of your loop variables(row
and col
) to understand what is happening here.
If collapse(2)
clause is not used
#pragma omp parallel for
for (row = 0; row < nrows; row++) {
for (col = 0; col < ncols; col++)
the sharing attribute of variable row
is private, because loop variables are implicitly privatized. Variable col
, however, is shared. It means that different threads use the same variable (memory location) creating a race condition. That is the reason you sometimes obtain unexpected results.
On the other hand, if the collapse(2)
clause is used, both loop variables are privatized, so there is no race condition, and you always obtain the correct result.
To fix your code (without using the collapse(2)
clause) you have to make col
private. To do so you have 2 alternatives:
a) The preferred method is to define your variables in their minimum required scope. Note that variables declared in the parallel region are private by default.
#pragma omp parallel for
for (size_t row = 0; row < nrows; row++) {
for (size_t col = 0; col < ncols; col++)
b) the alternative is to explicitly use the private
clause:
#pragma omp parallel for private(col)
for (row = 0; row < nrows; row++) {
for (col = 0; col < ncols; col++)
Note that line cells[row * nrows + col] = 1;
is free from data race, so atomic operation and flush
are not necessary.
Upvotes: 2