alb_j
alb_j

Reputation: 85

Nested loop, how to parallel do on outside loop while sequently do on inside loop

A simple example:

module parameters
  implicit none
  integer :: i,step
  integer :: nt=5
  integer :: nelectron=5
  integer :: num_threads=2
  real(8) :: vt=855555.0
  real(8) :: dt=1.d-5
  real(8) :: vx1_old,vy1_old,vz1_old,t1,t2,x_old,y_old
  real(8) :: x_length=0.0
  real(8) :: y_length=0.0
  real(8) :: vx1_new,vy1_new,vz1_new,vzstore,x_new,y_new
end module parameters
program main
  use parameters
  use omp_lib
  implicit none
  integer :: thread_num

  !$ call omp_set_num_threads(num_threads)
  !$ call omp_set_nested(.false.)

  call cpu_time(t1)

  !$omp parallel
  !$omp& default(private) shared(x_length,y_length)
  !$omp& schedule(static,chunk)
  !$omp& reduction(+:x_length,y_length)
  !$omp do

  do i=1,nelectron

     do step=1,nt

        if(step==1)then           
           vx1_new=1.0
           vy1_new=1.0
           vz1_new=1.0
           x_new=1.0
           y_new=1.0 
        endif

        thread_num=omp_get_thread_num()
        write(*,*)"thread_num",thread_num
        write(*,*)"i",i
        write(*,*)"step",step
        write(*,*) 

        vx1_old=vx1_new
        vy1_old=vy1_new
        vz1_old=vz1_new
        x_old=x_new
        y_old=y_new

        x_length=x_length+x_old
        y_length=y_length+y_old
     enddo       
  enddo
  !$omp end do
  !$omp end parallel
  call cpu_time(t2)
  write(*,*)"x length=",x_length
  write(*,*)"y length=",y_length 
end program main

When I output the thread doing the actual work with i and step, I see somewhere weird:enter image description here

As you can see, thread 0 is doing the i=6,step=1 while thread 1 is doing i=6,step=2.Why it changed thread when doing the same i? and How can I avoid this kind of situation. Meaning for each i, inner loop step is done on the same thread.

Upvotes: 0

Views: 191

Answers (1)

In OpenMP only the outer most loop is parallelized, unless the collapse clause is used. That means that the whole iteration of the inner loop:

 do step=1,nt

    if(step==1)then           
       vx1_new=1.0
       vy1_new=1.0
       vz1_new=1.0
       x_new=1.0
       y_new=1.0 
    endif

    thread_num=omp_get_thread_num()
    write(*,*)"thread_num",thread_num
    write(*,*)"i",i
    write(*,*)"step",step
    write(*,*) 

    vx1_old=vx1_new
    vy1_old=vy1_new
    vz1_old=vz1_new
    x_old=x_new
    y_old=y_new

    x_length=x_length+x_old
    y_length=y_length+y_old
 enddo    

is done by a single thread sequentially with a constant i. The thread gets a value of i and does the whole piece I quoted above for that value of i without any interaction with neighbouring threads.

However, as I pointed out in the comment, your syntax of the OpenMP directives is wrong. In this specific case it causes a race condition instead of the correct reduction for x_length and y_length. It AFAIK does not cause the problem you suspected to exist.

You should do

  !$omp parallel &
  !$omp& default(private)  &
  !$omp& shared(x_length,y_length)

  !$omp do schedule(static,5) reduction(+:x_length,y_length)

or just avoid the tricky line continuations

  !$omp parallel default(private) shared(x_length,y_length)

  !$omp do schedule(static,5) reduction(+:x_length,y_length)

As I commented above, do not trust output from parallel programs unless you take care of the ordering somehow and never use cpu_time() for parallel programs.

Upvotes: 1

Related Questions