Reputation: 1
The code solves the following equation:
A1(y,bp,kp) = \sum_i (B(y,yp_i)*C(Yp_i,Bp,Kp)*sum_j(D(bpp_j,kpp_j,yp_i,bp,kp)*A0(yp_i,bpp,kpp)))
I have the following code with multiple do-loops. The purpose is to compute a Matrix A1 based on other arrays.
do iKp = 1 , nK
do iBp = 1 , nB
do iY = 1 , nY
tempibk = 0.0_dp
do iYp = 1 , nY
do iBK = 1 , nBK
ibpp = bpi(iYp,iBp,iKp,iBK)
ikpp = Kpi(iYp,iBp,iKp,iBK)
tempibk(iYp) = tempibk(iYp) + D(iYp,iBp,iKp,iBK)*A0(iYp,ibpp,ikpp)
enddo
A1(iY,iBp,iKp) = A1(iY,iBp,iKp) + B(iY,iYp)*C(iYp,iBp,iKp)*tempibk(iYp))
end do
end do
end do
end do
I want to parellelize the code with OpenMP. My question is related with the thread safety of this piece of code.
My atempt to parellelize it is the following:
!$OMP PARALLEL DO PRIVATE ( iY , iBp , iYp ,iKp, iBK, ibpp, ikpp, tempibk) SHARED(q1)
do iKp = 1 , nK
do iBp = 1 , nB
do iY = 1 , nY
tempibk = 0.0_dp
do iYp = 1 , nY
do iBK = 1 , nBK
ibpp = bpi(iYp,iBp,iKp,iBK)
ikpp = Kpi(iYp,iBp,iKp,iBK)
tempibk(iYp) = tempibk(iYp) + D(iYp,iBp,iKp,iBK)*A0(iYp,ibpp,ikpp)
enddo
!$OMP ATOMIC
A1(iY,iBp,iKp) = A1(iY,iBp,iKp) + B(iY,iYp)*C(iYp,iBp,iKp)*tempibk(iYp))
!$OMP END ATOMIC
end do
end do
end do
end do
!$OMP END PARALLEL DO
My concern was that multiple threads working on A1 would create a race condition. Two threads working on different (iY, iYp, iBp, iKp) combinations might try to access and update the same element of A1 at the same time. My solution was the use of the ATOMIC directive.
Should the code inside the iBK do-loop have a critical section?
I am also wondering if tempibk needs a REDUCTION. In fact, would A1 also use a REDUCTION?
I also don't quite understand is which tempibk array does the thread with the (ikp,ibp,iy,iyp) "sees".
Upvotes: -1
Views: 99
Reputation: 2688
You have already asked a very similar question. I think that you should learn a bit about OpenMP to get the basic principles. OpenMP is a relatively simple way to implement parallelism, nonetheless some learning is required.
To answer your questions:
Two threads working on different (iY, iYp, iBp, iKp) combinations might try to access and update the same element of q1 at the same time.
I assume you meant A1
and not q1
(so one should also read private(A1)
, but it's OK because all variables are shared
unless specified otherwise)?
This cannot happen. Two different threads necessarily have different iKp
values, so the element A1(iY,iBp,iKp)
cannot be the same in two different threads (in other terms no race condition can occur on A1
).
Consequently, the atomic
directive is not needed.
Should the code inside the iBK do-loop have a critical section?
All the variables you are updating in this loop are private
, so there cannot be any race condition. Which means that a critical section is not needed.
I am also wondering if tempibk needs a REDUCTION.
Generally there's no need of reduction if the content of the variable is not needed after the end of the parallel region.
In fact, would A1 also use a REDUCTION?
Since there's no race condition on A1
, no reduction is needed.
I also don't quite understand is which tempibk array does the thread with the (ikp,ibp,iy,iyp) "sees".
When a variable is private, on entering the parallel region the compiler allocates as many instances of this variable as the number of thread. Each instance is attached to a given thread and is visible only from this thread.
For instance
real :: temp
integer :: i, j, m
!$OMP PARALLEL DO PRIVATE(itemp) NUM_THREADS(2)
do i = 1, 8
temp = 0.0
do j = 1, m
temp = temp + cos(real(i+j))
end do
print*, i, temp
end do
!$OMP END PARALLEL DO
The threads 0 does:
real :: temp_0
integer :: i_0, j_0
do i_0 = 1, 4
temp_0 = 0.0
do j_0 = 1, m
temp_0 = temp_0 + cos(real(i_0+j_0))
end do
print*, i_0, temp_0
end do
And the threads 1 does:
real :: temp_1
integer :: i_1, j_1
do i_1 = 5, 8
temp_1 = 0.0
do j_1 = 1, m
temp_1 = temp_1 + cos(real(i_1+j_1))
end do
print*, i_1, temp_1
end do
The *_0
and *_1
variables are actually created when entering the parallel region, and destroyed when exiting it.
Note that the indexes of all the loops inside the parallel region are private by default, so writing private(i,j)
is not required.
Upvotes: 1