Engineer
Engineer

Reputation: 8847

Knowing what SIMD instructions OpenMP 4.0 will produce?

Short of checking the actual assembly produced, is there any way to determine what platform-specific instructions will be utilised by OpenMP, for a given use case?

For example, I've identified pcmpeqq i.e. 64-bit integer word equality (SSE 4.1) as the desirable instruction rather than pcmpeqd i.e. 32-bit word equality (SSE 2). Is there any way to know that OpenMP 4.0 will produce the former and not the latter? (spec does not address such specifics.)

Upvotes: 0

Views: 151

Answers (1)

Jonathan Dursi
Jonathan Dursi

Reputation: 50927

The only way to ever guarantee that any compiler will ever emit a particular assembly instruction is to hardcode it. There's no spec in the world that constrains the compiler to generate specific instructions for a given language feature.

Having said that, if support for SSE4.1 or better is specified implicitly or explicitly on the command line, it would greatly surprise me if many compilers emitted SSE2 instructions in situations where the later instructions would work.

Checking the assembly isn't difficult:

$ cat foo.c
#include <stdio.h>

int main(int argc, char **argv) {

    const int n=128;

    long x[n];
    long y[n];

    for (int i=0; i<n/2; i++) {
        x[i] = y[i] = 1;
        x[i+n/2] = 2;
        y[i+n/2] = 2;
    }

    #pragma omp simd
    for (int i=0; i<n; i++)
        x[i] = (x[i] == y[i]);

    for (int i=0; i<n; i++)
        printf("%d: %ld\n", i, x[i]);

    return 0;
}

$ icc -openmp -msse4.1 -o foo41.s foo.c -S -std=c99 -qopt-report-phase=vec -qopt-report=2
icc: remark #10397: optimization reports are generated in *.optrpt files in the output location
$ icc -openmp -msse2 -o foo2.s foo.c -S -std=c99 -qopt-report-phase=vec -qopt-report=2 -o foo2.s
icc: remark #10397: optimization reports are generated in *.optrpt files in the output location

And sure enough:

$ grep pcmp foo41.s
    pcmpeqq   (%rax,%rsi,8), %xmm0                          #18.25

$ grep pcmp foo2.s
    pcmpeqd   (%rax,%rsi,8), %xmm2                          #18.25

Upvotes: 3

Related Questions