Kristen Bun
Kristen Bun

Reputation: 1

Parallel overlap with OMP and back updating arrays in Fortran?

The below is a slightly altered code snippet I am working on for my project and I am having a strange parallel problem with the test1,2,3 routines in which the numbers are sometimes wrong:

  integer, parameter :: N=6
  integer, parameter :: chunk_size=3
  integer, dimension(1:N) :: a,b,c

contains

  subroutine array_setup
    implicit none
    integer :: i

    do i=1,N
       a(i)=2*i
       b(i)=i*i
       c(i)=i*i-i+2
    end do

    return
  end subroutine array_setup

  subroutine test1
    implicit none
    integer :: i

    !$OMP parallel do private(i) shared(a,b,c) schedule(static,chunk_size)
    do i=2,N
       a(i-1)=b(i)
       c(i)=a(i)
    end do
    !$OMP end parallel do

    return
  end subroutine test1

  subroutine test2
    implicit none
    integer :: i

    !$OMP parallel do private(i) shared(a,b,c) schedule(static,chunk_size)
    do i=2,N
       a(i-1)=b(i)
       a(i)=c(i)
    end do
    !$OMP end parallel do

    return
  end subroutine test2

  subroutine test3
    implicit none
    integer :: i

    !$OMP parallel do private(i) shared(a,b,c) schedule(static,chunk_size)
    do i=2,N
       b(i)=a(i-1)
       a(i)=c(i)
    end do
    !$OMP end parallel do

    return
  end subroutine test3

end program vectorize_test

Below is a sample of the output when OMP_NUM_THREADS=1 which is correct:

 after setup
           1           2           1           2
           2           4           4           4
           3           6           9           8
           4           8          16          14
           5          10          25          22
           6          12          36          32
 after test1
           1           4           1           2
           2           9           4           4
           3          16           9           6
           4          25          16           8
           5          36          25          10
           6          12          36          12
 after test2
           1           4           1           2
           2           9           4           4
           3          16           9           8
           4          25          16          14
           5          36          25          22
           6          32          36          32
 after test3
           1           2           1           2
           2           4           2           4
           3           8           4           8
           4          14           8          14
           5          22          14          22
           6          32          22          32

However, when I increase the thread count to above 1, I get strange numbers changing in each of the columns making the output incorrect, where am I going wrong with this, and what can I do to fix it?

Upvotes: 0

Views: 168

Answers (1)

When you do

!$OMP parallel do private(i) shared(a,b,c) schedule(static,chunk_size)
do i=2,N
   a(i-1)=b(i)
   c(i)=a(i)
end do
!$OMP end parallel do

you can have one thread reading value of a(i) which wasn't computed yet because it is scheduled for some other thread. The loop iterations are dependent on the previous one. You can't parallelize it it this way. You can also have one thread reading the same a(i) location which some other thread is just writing. That is also an error (race condition).

In loop

!$OMP parallel do private(i) shared(a,b,c) schedule(static,chunk_size)
do i=2,N
   a(i-1)=b(i)
   a(i)=c(i)
end do
!$OMP end parallel do

the iterations are also not independent. Note that most of the locattions of a(i) will get overwritten in the next iteration. Again two threads may clash in the order these two operations should be done. Yiu can safely rewrite this as

a(1) = b(2)
!$OMP parallel do private(i) shared(a,b,c) schedule(static,chunk_size)
do i=2,N
   a(i)=c(i)
end do
!$OMP end parallel do

The third loop

!$OMP parallel do private(i) shared(a,b,c) schedule(static,chunk_size)
do i=2,N
   b(i)=a(i-1)
   a(i)=c(i)
end do
!$OMP end parallel do

has the same problem as the first loop. Each iteration depends on the previous iteration's value. This cannot be easily parallelized. You must find a way how to rewrite the algorithm so that the iterations do not depend on each other.

Note that there is no nead for the return in each subroutine. You also don't need implicit none in each subroutine if you have it in the parent scope.

Upvotes: 2

Related Questions