Reputation: 13
I'm trying to write a code for matrix multiplication. As far as I understand OMP and pararel programming this code may suffer from race condition.
#pragma omp parallel
#pragma omp for
for (int k = 0; k < size; k++){
for (int i = 0; i < size; i++) {
for (int j = 0; j < size; j++) {
c[i][j] += a[i][k] * b[k][j];
}}}
Do I get rid of it if I put #pragma omp atomic before writing to c matrix or by adding private(i) to 2nd #pragma? Also is it possible to make this code false-sharing free? If yes, how ?
Upvotes: 0
Views: 291
Reputation: 2818
A race condition occurs when 2 or more threads access the same memory location and at least one of them is writing it. Line c[i][j] +=...
can cause data race in your code. The solution is to reorder your nested loops (use the order of i
,j
,k
) and you may introduce a temporary variable to calculate the dot product:
#pragma omp parallel for
for (int i = 0; i < size; i++) {
for (int j = 0; j < size; j++) {
double tmp=0; // change its type as needed
for (int k = 0; k < size; k++){
tmp += a[i][k] * b[k][j];
}
c[i][j] = tmp; //note that += was used in your original code
}
}
Note that your code will be faster if you calculate the transpose of matrix b
. For more details read this.
UPDATE:
If you need to maintain the order of loops, there are 2 possibilities (but these solutions may be slower than the serial code):
Use atomic operation (i.e #pragma omp atomic
). In this case false sharing also can be a problem.
If your stack is large enough to store the matrix for all threads, a better alternative is to use reduction: #pragma omp parallel for reduction(+:c[:size][:size])
(Another alternative is to do the reduction manually. In this case you can allocate the matrices used for reduction on the heap.)
Upvotes: 1