Reputation: 23
If I have in C++11 (on Linux, with gcc on Intel Xeon) two __float128*
arrays A and B (fixed size, fits entirely in the cache), do you know of/can provide a code that makes the __float128
dot product of those arrays (i.e. the sum of their element-wise product) using SIMD/AVX acceleration where possible.
Unfortunately MKL (and no efficient BLAS library afaik) supports __float128
, so this acceleration would reduce somewhat the massive __float128
slowdown versus double to a point where we really can use it.
There are numerical stability reasons to go for __float128
in our case, so less than that is not an option unfortunately.
Upvotes: 0
Views: 124