Zillux
Zillux

Reputation: 1

OpenMP code in fortran - Reduction question

The code solves the following equation: A1(y,bp,kp) = \sum_i (B(y,yp_i)*C(Yp_i,Bp,Kp)*sum_j(D(bpp_j,kpp_j,yp_i,bp,kp)*A0(yp_i,bpp,kpp)))

I have the following code with multiple do-loops. The purpose is to compute a Matrix A1 based on other arrays.

    do iKp = 1 , nK
        do iBp = 1 , nB
             do iY = 1 , nY
                tempibk = 0.0_dp
                do iYp = 1 , nY
                    do iBK = 1 , nBK
                        ibpp = bpi(iYp,iBp,iKp,iBK)
                        ikpp = Kpi(iYp,iBp,iKp,iBK)
                        tempibk(iYp) = tempibk(iYp) + D(iYp,iBp,iKp,iBK)*A0(iYp,ibpp,ikpp)
                    enddo 
            A1(iY,iBp,iKp) = A1(iY,iBp,iKp) + B(iY,iYp)*C(iYp,iBp,iKp)*tempibk(iYp))
                end do
            end do
        end do
    end do

I want to parellelize the code with OpenMP. My question is related with the thread safety of this piece of code.

My atempt to parellelize it is the following:

!$OMP PARALLEL DO PRIVATE ( iY , iBp , iYp ,iKp, iBK, ibpp, ikpp, tempibk) SHARED(q1)
    do iKp = 1 , nK
        do iBp = 1 , nB
             do iY = 1 , nY
                tempibk = 0.0_dp
                do iYp = 1 , nY
                    do iBK = 1 , nBK
                        ibpp = bpi(iYp,iBp,iKp,iBK)
                        ikpp = Kpi(iYp,iBp,iKp,iBK)
                        tempibk(iYp) = tempibk(iYp) + D(iYp,iBp,iKp,iBK)*A0(iYp,ibpp,ikpp)
                    enddo 
                    !$OMP ATOMIC
            A1(iY,iBp,iKp) = A1(iY,iBp,iKp) + B(iY,iYp)*C(iYp,iBp,iKp)*tempibk(iYp))
                    !$OMP END ATOMIC
                end do
            end do
        end do
    end do
!$OMP END PARALLEL DO

My concern was that multiple threads working on A1 would create a race condition. Two threads working on different (iY, iYp, iBp, iKp) combinations might try to access and update the same element of A1 at the same time. My solution was the use of the ATOMIC directive.

Should the code inside the iBK do-loop have a critical section?

I am also wondering if tempibk needs a REDUCTION. In fact, would A1 also use a REDUCTION?

I also don't quite understand is which tempibk array does the thread with the (ikp,ibp,iy,iyp) "sees".

Upvotes: -1

Views: 99

Answers (1)

PierU
PierU

Reputation: 2688

You have already asked a very similar question. I think that you should learn a bit about OpenMP to get the basic principles. OpenMP is a relatively simple way to implement parallelism, nonetheless some learning is required.

To answer your questions:

Two threads working on different (iY, iYp, iBp, iKp) combinations might try to access and update the same element of q1 at the same time.

I assume you meant A1 and not q1 (so one should also read private(A1), but it's OK because all variables are shared unless specified otherwise)?

This cannot happen. Two different threads necessarily have different iKp values, so the element A1(iY,iBp,iKp) cannot be the same in two different threads (in other terms no race condition can occur on A1).

Consequently, the atomic directive is not needed.

Should the code inside the iBK do-loop have a critical section?

All the variables you are updating in this loop are private, so there cannot be any race condition. Which means that a critical section is not needed.

I am also wondering if tempibk needs a REDUCTION.

Generally there's no need of reduction if the content of the variable is not needed after the end of the parallel region.

In fact, would A1 also use a REDUCTION?

Since there's no race condition on A1, no reduction is needed.

I also don't quite understand is which tempibk array does the thread with the (ikp,ibp,iy,iyp) "sees".

When a variable is private, on entering the parallel region the compiler allocates as many instances of this variable as the number of thread. Each instance is attached to a given thread and is visible only from this thread.

For instance

real :: temp
integer :: i, j, m
!$OMP PARALLEL DO PRIVATE(itemp) NUM_THREADS(2)
do i = 1, 8
   temp = 0.0
   do j = 1, m
      temp = temp + cos(real(i+j))
   end do
   print*, i, temp
end do
!$OMP END PARALLEL DO

The threads 0 does:

real :: temp_0
integer :: i_0, j_0
do i_0 = 1, 4
   temp_0 = 0.0
   do j_0 = 1, m
      temp_0 = temp_0 + cos(real(i_0+j_0))
   end do
   print*, i_0, temp_0
end do

And the threads 1 does:

real :: temp_1
integer :: i_1, j_1
do i_1 = 5, 8
   temp_1 = 0.0
   do j_1 = 1, m
      temp_1 = temp_1 + cos(real(i_1+j_1))
   end do
   print*, i_1, temp_1
end do

The *_0 and *_1 variables are actually created when entering the parallel region, and destroyed when exiting it.

Note that the indexes of all the loops inside the parallel region are private by default, so writing private(i,j) is not required.

Upvotes: 1

Related Questions