Reputation: 85
A simple example:
module parameters
implicit none
integer :: i,step
integer :: nt=5
integer :: nelectron=5
integer :: num_threads=2
real(8) :: vt=855555.0
real(8) :: dt=1.d-5
real(8) :: vx1_old,vy1_old,vz1_old,t1,t2,x_old,y_old
real(8) :: x_length=0.0
real(8) :: y_length=0.0
real(8) :: vx1_new,vy1_new,vz1_new,vzstore,x_new,y_new
end module parameters
program main
use parameters
use omp_lib
implicit none
integer :: thread_num
!$ call omp_set_num_threads(num_threads)
!$ call omp_set_nested(.false.)
call cpu_time(t1)
!$omp parallel
!$omp& default(private) shared(x_length,y_length)
!$omp& schedule(static,chunk)
!$omp& reduction(+:x_length,y_length)
!$omp do
do i=1,nelectron
do step=1,nt
if(step==1)then
vx1_new=1.0
vy1_new=1.0
vz1_new=1.0
x_new=1.0
y_new=1.0
endif
thread_num=omp_get_thread_num()
write(*,*)"thread_num",thread_num
write(*,*)"i",i
write(*,*)"step",step
write(*,*)
vx1_old=vx1_new
vy1_old=vy1_new
vz1_old=vz1_new
x_old=x_new
y_old=y_new
x_length=x_length+x_old
y_length=y_length+y_old
enddo
enddo
!$omp end do
!$omp end parallel
call cpu_time(t2)
write(*,*)"x length=",x_length
write(*,*)"y length=",y_length
end program main
When I output the thread doing the actual work with i
and step
, I see somewhere weird:
As you can see, thread 0 is doing the i=6,step=1
while thread 1 is doing i=6,step=2
.Why it changed thread when doing the same i
? and How can I avoid this kind of situation. Meaning for each i
, inner loop step
is done on the same thread.
Upvotes: 0
Views: 191
Reputation: 60008
In OpenMP only the outer most loop is parallelized, unless the collapse
clause is used. That means that the whole iteration of the inner loop:
do step=1,nt
if(step==1)then
vx1_new=1.0
vy1_new=1.0
vz1_new=1.0
x_new=1.0
y_new=1.0
endif
thread_num=omp_get_thread_num()
write(*,*)"thread_num",thread_num
write(*,*)"i",i
write(*,*)"step",step
write(*,*)
vx1_old=vx1_new
vy1_old=vy1_new
vz1_old=vz1_new
x_old=x_new
y_old=y_new
x_length=x_length+x_old
y_length=y_length+y_old
enddo
is done by a single thread sequentially with a constant i
. The thread gets a value of i
and does the whole piece I quoted above for that value of i
without any interaction with neighbouring threads.
However, as I pointed out in the comment, your syntax of the OpenMP directives is wrong. In this specific case it causes a race condition instead of the correct reduction for x_length
and y_length
. It AFAIK does not cause the problem you suspected to exist.
You should do
!$omp parallel &
!$omp& default(private) &
!$omp& shared(x_length,y_length)
!$omp do schedule(static,5) reduction(+:x_length,y_length)
or just avoid the tricky line continuations
!$omp parallel default(private) shared(x_length,y_length)
!$omp do schedule(static,5) reduction(+:x_length,y_length)
As I commented above, do not trust output from parallel programs unless you take care of the ordering somehow and never use cpu_time()
for parallel programs.
Upvotes: 1