Reputation: 4624
I have a potentially simple question, but looking at SO I couldn't find any questions that asked quite the same thing. My question is: will the collapse
clause in the OpenMP code below properly handle both inner loops? Or will it only collapse with the first inner loop?
!$omp parallel do collapse(2) private(iy, ix, iz)
do iy = 1, ny
do ix = 1, nx
! stuff
enddo
do iz = 1, nz
! different stuff
enddo
enddo
!$omp end parallel do
This code compiles for me and obviously shows benefits of parallelization. However, I know that the standard says:
All loops associated with the loop construct must be perfectly nested; that is, there must be no intervening code nor any OpenMP directive between any two loops.
So my gut reaction is that OpenMP is only collapsing the first inner loop (ix
). But then how is it handling the second inner loop (iz
)?
I am obviously attempting the code to do the following, but it is much uglier and verbose to write the code this way:
!$omp parallel private(iy, ix, iz)
!$omp do collapse(2)
do iy = 1, ny
do ix = 1, nx
! stuff
enddo
enddo
!$omp end do nowait
!$omp do collapse(2)
do iy = 1, ny
do iz = 1, nz
! different stuff
enddo
enddo
!$omp end do nowait
!$omp end parallel do
Upvotes: 3
Views: 1759
Reputation: 15144
The first inner loop is code intervening between the outer loop and the second inner loop (as I understand it). If nz
≠nx
, you don’t have rectangular loops. In any case, the program semantics are that the first inner loop must complete before the second inner loop begins; it might perform intermediate calculations that the second loop uses. A given implementation of OpenMP might do what you want—I haven’t attempted to test this.
Note that the second example changes the semantics of the program: all the ix
loops execute, followed by all the iz
loops, rather than each ix
loop followed by each iz
loop for the same value of iy
. This should be safe if you could parallelize the ix
loop, as you can only do that if none of the ix
computations depend on any iz
computation, but might not be as efficient if the iz
loops are going to re-use the same data. So the correct semantics are going to depend on what needs to happen before a given loop can run. Do the iz
loops need the ix
loops to have run first for the same value of iy
? If not, you might be able to use nested parallelism.
Note on Loop Collapsing: Loop collapsing usually means you take a nested pair of loops, such as,
for (i=0;i<100;++i)
for (j=0;j<50;++j)
And turn them into a single loop like:
for (ij=0;ij<5000;++ij)
If you have two different inner loops with different indices, you cannot do this, and furthermore, the compiler can’t automatically change the order of execution as proposed because that changes program semantics. I’m not sure what every OpenMP implementation does with this code, but I’m pretty sure that it doesn’t work the way you were hoping.
Upvotes: 4