Vectorize __float128 dot product with SIMD/AVX

Question

If I have in C++11 (on Linux, with gcc on Intel Xeon) two __float128* arrays A and B (fixed size, fits entirely in the cache), do you know of/can provide a code that makes the __float128 dot product of those arrays (i.e. the sum of their element-wise product) using SIMD/AVX acceleration where possible.

Unfortunately MKL (and no efficient BLAS library afaik) supports __float128, so this acceleration would reduce somewhat the massive __float128 slowdown versus double to a point where we really can use it.

There are numerical stability reasons to go for __float128 in our case, so less than that is not an option unfortunately.

Vectorize __float128 dot product with SIMD/AVX

Answers (0)

Related Questions