NoseKnowsAll
NoseKnowsAll

Reputation: 4624

OpenMP SIMD vectorization of nested loop

I am trying to vectorize a nested loop using OpenMP 4.0's simd feature, but I'm afraid I'm doing it wrong. My loops looks like this:

do iy = iyfirst, iylast
    do ix = ixfirst, ixlast

        !$omp simd
        do iz = izfirst, izlast

            dudx(iz,ix,iy) = ax(1)*( u(iz,ix,iy) - u(iz,ix-1,iy) )
            do ishift = 2, ophalf
                dudx(iz,ix,iy) = dudx(iz,ix,iy) + ax(ishift)*( u(iz,ix+ishift-1,iy) - u(iz,ix-ishift,iy) )
            enddo

            dudx(iz,ix,iy) = dudx(iz,ix,iy)*buoy_x(iz,ix,iy)

        enddo
        !$omp end simd

    enddo
enddo

Note that ophalf is a small integer, usually 2 or 4, so it makes sense to vectorize the iz loop and not the inner-most loop.

My question is: Do I have to mark ishift as a private variable?

In standard OpenMP parallel do loops, you certainly do need a private(ishift) to ensure other threads don't stomp over each other's data. Yet when I instead rewrite the first line as !$omp simd private(ishift), I get the ifort compilation error:

error #8592: Within a SIMD region, a DO-loop control-variable must not be specified in a PRIVATE SIMD clause. [ISHIFT]

Looking online, I couldn't find any successful resolution of this question. It seems to me that ishift should be private, but the compiler is not allowing it. Is an inner-loop variable automatically forced to be private?

Follow-up question: Later, when I add an omp parallel do around the iy loop, should I include a private(ishift) clause in the omp parallel do directive, the omp simd directive, or both?

Thanks for any clarifications.

Upvotes: 4

Views: 1028

Answers (1)

Anoop - Intel
Anoop - Intel

Reputation: 354

Private clause when it comes to SIMD essentially means that the value of ishift is private to each SIMD lane within the SIMD register. This is true when we vectorize the innermost loop since ishift is the loop induction variable. But when you do a outer loop vectorization, every SIMD lane will have a different value for iz loop index, but given a loop index iz, ishift can still have values ranging from 2 to ophalf. So it doesn't qualify for private clause in SIMD context.

When it comes to multiple threads, you want copies of ishift so one thread incrementing this variable doesn't enable other thread skip that iteration. So private clause makes sense for ishift in omp parallel do context. It will be interesting to check the underlying code generation if the inner loop is completely unrolled and vectorized for the loop with loop index iz.

Upvotes: 0

Related Questions