Reputation: 13
I'm attempting to parallelise some old fixed format Fortran code with OpenMP. I can't figure out how to fully parallelise the following structure of nested loops, with one outer loop and 2 sequential inner loops:
do y = 1,ny
do x = 1,nx
calculation 1
enddo
intermediate calculation (calculation 1)
do x = 1,nx
calculation 2 (intermediate calculation)
enddo
enddo
calculation 2
is such that it cannot be included in the first inner loop, and must be included in the outer loop as opposed to in a separate outer loop. This is due to a dependency on an intermediate calculation, itself dependent on all values of calculation 1
I am compiling with gfortran, and setting the environment variable OMP_NUM_THREADS=4
. The following code illustrates the 3 approaches I've tested:
c Test of using OpenMP to parallelise sequential inner loops and an
c outer loop
c Use export OMP_NUM_THREADS=4
program parallelTest
c declarations
implicit none
integer nx, ny
parameter (nx = 2, ny = 5)
integer omp_get_thread_num, i, j, A(nx,ny), B(nx,ny), C(nx,ny),
> D(nx,ny), E(nx,ny), F(nx,ny)
c initialisation
A = 7
B = 7
C = 7
D = 7
E = 7
F = 7
c attempt 1: just executes first loop, 2nd loop is ignored
c$omp parallel do shared(A,B) private(i,j) schedule(static) collapse(2)
do j = 1,ny
do i = 1,nx
A(i,j) = omp_get_thread_num()
enddo
do i = 1,nx
B(i,j) = omp_get_thread_num()
enddo
enddo
c$omp end parallel do
c attempt 2: only parallelises outer loop
c$omp parallel do shared(C,D) private(i,j) schedule(static)
do j = 1,ny
do i = 1,nx
C(i,j) = omp_get_thread_num()
enddo
do i = 1,nx
D(i,j) = omp_get_thread_num()
enddo
enddo
c$omp end parallel do
c attempt 3: only parallelises inner loops
c$omp parallel shared(E,F) private(i,j)
do j = 1,ny
c$omp do schedule(static)
do i = 1,nx
E(i,j) = omp_get_thread_num()
enddo
c$omp end do
c$omp do schedule(static)
do i = 1,nx
F(i,j) = omp_get_thread_num()
enddo
c$omp end do
enddo
c$omp end parallel
c print output to terminal
do i = 1,nx
print *, 'A(', i, ',:) = ', A(i,:)
enddo
print *
do i = 1,nx
print *, 'B(', i, ',:) = ', B(i,:)
enddo
print *
print *
do i = 1,nx
print *, 'C(', i, ',:) = ', C(i,:)
enddo
print *
do i = 1,nx
print *, 'D(', i, ',:) = ', D(i,:)
enddo
print *
print *
do i = 1,nx
print *, 'E(', i, ',:) = ', E(i,:)
enddo
print *
do i = 1,nx
print *, 'F(', i, ',:) = ', F(i,:)
enddo
end
This gives the following output:
A( 1 ,:) = 0 0 1 2 3
A( 2 ,:) = 0 1 1 2 3
B( 1 ,:) = 7 7 7 7 7
B( 2 ,:) = 7 7 7 7 7
C( 1 ,:) = 0 0 1 2 3
C( 2 ,:) = 0 0 1 2 3
D( 1 ,:) = 0 0 1 2 3
D( 2 ,:) = 0 0 1 2 3
E( 1 ,:) = 0 0 0 0 0
E( 2 ,:) = 1 1 1 1 1
F( 1 ,:) = 0 0 0 0 0
F( 2 ,:) = 1 1 1 1 1
In approach 1, the outer loop and first inner loop are parallelised as I expect, but the 2nd loop doesn't execute (presumably because the do
doesn't immediately follow a c$omp do
?). In approach 2, only the outer loop is parallelised. In approach 3, only the inner loops are parallelised.
My question is: how do I get matrix B
to be the same as matrix A
? This seems like it should be a straightforward task; I'm assuming there's an OpenMP clause or structure I'm not utilising, but would much appreciate some pointing in the right direction of what to search for (new to OpenMP!).
Upvotes: 1
Views: 192
Reputation: 2981
If the information flow is as you have described it, I would re-write your loops as
do y = 1,ny
do x = 1,nx
calculation 1
enddo
enddo
do y = 1,ny
intermediate calculation (calculation 1)
enddo
do y = 1,ny
do x = 1,nx
calculation 2 (intermediate calculation)
enddo
enddo
And then parallelise each y
loop separately.
Upvotes: 2