Reputation: 1
The below is a slightly altered code snippet I am working on for my project and I am having a strange parallel problem with the test1,2,3 routines in which the numbers are sometimes wrong:
integer, parameter :: N=6
integer, parameter :: chunk_size=3
integer, dimension(1:N) :: a,b,c
contains
subroutine array_setup
implicit none
integer :: i
do i=1,N
a(i)=2*i
b(i)=i*i
c(i)=i*i-i+2
end do
return
end subroutine array_setup
subroutine test1
implicit none
integer :: i
!$OMP parallel do private(i) shared(a,b,c) schedule(static,chunk_size)
do i=2,N
a(i-1)=b(i)
c(i)=a(i)
end do
!$OMP end parallel do
return
end subroutine test1
subroutine test2
implicit none
integer :: i
!$OMP parallel do private(i) shared(a,b,c) schedule(static,chunk_size)
do i=2,N
a(i-1)=b(i)
a(i)=c(i)
end do
!$OMP end parallel do
return
end subroutine test2
subroutine test3
implicit none
integer :: i
!$OMP parallel do private(i) shared(a,b,c) schedule(static,chunk_size)
do i=2,N
b(i)=a(i-1)
a(i)=c(i)
end do
!$OMP end parallel do
return
end subroutine test3
end program vectorize_test
Below is a sample of the output when OMP_NUM_THREADS=1 which is correct:
after setup
1 2 1 2
2 4 4 4
3 6 9 8
4 8 16 14
5 10 25 22
6 12 36 32
after test1
1 4 1 2
2 9 4 4
3 16 9 6
4 25 16 8
5 36 25 10
6 12 36 12
after test2
1 4 1 2
2 9 4 4
3 16 9 8
4 25 16 14
5 36 25 22
6 32 36 32
after test3
1 2 1 2
2 4 2 4
3 8 4 8
4 14 8 14
5 22 14 22
6 32 22 32
However, when I increase the thread count to above 1, I get strange numbers changing in each of the columns making the output incorrect, where am I going wrong with this, and what can I do to fix it?
Upvotes: 0
Views: 168
Reputation: 59998
When you do
!$OMP parallel do private(i) shared(a,b,c) schedule(static,chunk_size)
do i=2,N
a(i-1)=b(i)
c(i)=a(i)
end do
!$OMP end parallel do
you can have one thread reading value of a(i)
which wasn't computed yet because it is scheduled for some other thread. The loop iterations are dependent on the previous one. You can't parallelize it it this way. You can also have one thread reading the same a(i)
location which some other thread is just writing. That is also an error (race condition).
In loop
!$OMP parallel do private(i) shared(a,b,c) schedule(static,chunk_size)
do i=2,N
a(i-1)=b(i)
a(i)=c(i)
end do
!$OMP end parallel do
the iterations are also not independent. Note that most of the locattions of a(i)
will get overwritten in the next iteration. Again two threads may clash in the order these two operations should be done. Yiu can safely rewrite this as
a(1) = b(2)
!$OMP parallel do private(i) shared(a,b,c) schedule(static,chunk_size)
do i=2,N
a(i)=c(i)
end do
!$OMP end parallel do
The third loop
!$OMP parallel do private(i) shared(a,b,c) schedule(static,chunk_size)
do i=2,N
b(i)=a(i-1)
a(i)=c(i)
end do
!$OMP end parallel do
has the same problem as the first loop. Each iteration depends on the previous iteration's value. This cannot be easily parallelized. You must find a way how to rewrite the algorithm so that the iterations do not depend on each other.
Note that there is no nead for the return
in each subroutine. You also don't need implicit none
in each subroutine if you have it in the parent scope.
Upvotes: 2