HerpDerpington
HerpDerpington

Reputation: 4445

(How) Can I vectorize `std::complex<double>` using openmp?

I want to optimize my application using vectorization. More specifically, I want to vectorize the mathematical operations on the std::complex<double> type. However, this seems to be quite difficult. Consider the following example:

#define TEST_LEN 100
#include <algorithm>
#include <complex>
typedef std::complex<double> cmplx;
using namespace std::complex_literals;

#pragma omp declare simd
cmplx add(cmplx a, cmplx b)
{
     return a + b;
}

#pragma omp declare simd
cmplx mult(cmplx a, cmplx b)
{
     return a * b;
}

void k(cmplx *x, cmplx *&y, int i0, int N)
{
    #pragma omp for simd
    for (int i = i0; i < N; i++)
        y[i] = add(mult(-(1i + 1.0), x[i]), 1i);
}

int main(int argc, char **argv)
{
    cmplx *x = new cmplx[TEST_LEN];
    cmplx *y = new cmplx[TEST_LEN];

    for (int i = 0; i < TEST_LEN; i++)
        x[i] = 0;

    for (int i = 0; i < TEST_LEN; i++)
    {
        int N = std::min(4, TEST_LEN - i);
        k(x, y, i, N);
    }

    delete[] x;
    delete[] y;

    return 1;
}

I am using the g++ compiler. For this code the compiler gives the following warning:

warning: unsupported return type 'cmplx' {aka 'std::complex'} for simd

for the lines containing the mult and add function. It seems like it is not possible to vectorize the std::complex<double> type like this.

Is there a different way how this can be archieved?

Upvotes: 3

Views: 504

Answers (1)

robthebloke
robthebloke

Reputation: 9682

Not easily. SIMD works quite well when you have values in the next N steps that behave the same way. So consider for example an array of 2D vectors:

X Y X Y X Y X Y

If we were to do a vector addition operation here,

X Y X Y X Y X Y
+ + + + + + + +
X Y X Y X Y X Y

The compiler will nicely vectorise that operation. If however we were to want to do something different for the X and Y values, the memory layout becomes problematic for SIMD:

X Y X Y X Y X Y
+ / + / + / + /
X Y X Y X Y X Y

If you consider for example the multiplication case:

(a + bi) (c + di) = (ac - bd)  (ad + bc)i

Suddenly the operations are jumping between SIMD lanes, which is pretty much going to kill any decent vectorization.

Take a quick look at this godbolt: https://godbolt.org/z/rnVVgl Addition boils down to some vaddps instructions (working on 8 floats at a time). Multiply ends up using vfmadd231ss and vmulss (which both work on 1 float at a time).

The only easy way to automatically vectorise your complex code would be to seperate out the real and imaginary parts into 2 arrays:

struct ComplexArray {
  float* real;
  float* imaginary;
};

Within this godbolt you can see that the compiler is now using vfmadd213ps instructions (so again back to working on 8 floats at a time).

https://godbolt.org/z/Ostaax

Upvotes: 5

Related Questions