Reputation: 4624
I am trying to vectorize a nested loop using OpenMP 4.0's simd
feature, but I'm afraid I'm doing it wrong. My loops looks like this:
do iy = iyfirst, iylast
do ix = ixfirst, ixlast
!$omp simd
do iz = izfirst, izlast
dudx(iz,ix,iy) = ax(1)*( u(iz,ix,iy) - u(iz,ix-1,iy) )
do ishift = 2, ophalf
dudx(iz,ix,iy) = dudx(iz,ix,iy) + ax(ishift)*( u(iz,ix+ishift-1,iy) - u(iz,ix-ishift,iy) )
enddo
dudx(iz,ix,iy) = dudx(iz,ix,iy)*buoy_x(iz,ix,iy)
enddo
!$omp end simd
enddo
enddo
Note that ophalf
is a small integer, usually 2 or 4, so it makes sense to vectorize the iz
loop and not the inner-most loop.
My question is: Do I have to mark ishift
as a private variable?
In standard OpenMP parallel do
loops, you certainly do need a private(ishift)
to ensure other threads don't stomp over each other's data. Yet when I instead rewrite the first line as !$omp simd private(ishift)
, I get the ifort compilation error:
error #8592: Within a SIMD region, a DO-loop control-variable must not be specified in a PRIVATE SIMD clause. [ISHIFT]
Looking online, I couldn't find any successful resolution of this question. It seems to me that ishift
should be private, but the compiler is not allowing it. Is an inner-loop variable automatically forced to be private?
Follow-up question: Later, when I add an omp parallel do
around the iy
loop, should I include a private(ishift)
clause in the omp parallel do
directive, the omp simd
directive, or both?
Thanks for any clarifications.
Upvotes: 4
Views: 1028
Reputation: 354
Private clause when it comes to SIMD essentially means that the value of ishift is private to each SIMD lane within the SIMD register. This is true when we vectorize the innermost loop since ishift is the loop induction variable. But when you do a outer loop vectorization, every SIMD lane will have a different value for iz loop index, but given a loop index iz, ishift can still have values ranging from 2 to ophalf. So it doesn't qualify for private clause in SIMD context.
When it comes to multiple threads, you want copies of ishift so one thread incrementing this variable doesn't enable other thread skip that iteration. So private clause makes sense for ishift in omp parallel do context. It will be interesting to check the underlying code generation if the inner loop is completely unrolled and vectorized for the loop with loop index iz.
Upvotes: 0