Unlikus
Unlikus

Reputation: 1858

How to get _mm256_rcp_pd in AVX2?

For some reason _mm256_rcp_pd is not in AVX or AVX2.

In AVX512 we got _mm256_rcp14_pd.

Is there a way to get a fast approximate reciprocal in double precision on AVX2? Are we supposed to convert to single precision and then back?

Upvotes: 1

Views: 291

Answers (1)

chtz
chtz

Reputation: 18827

With some integer-cast-hacking, and a Newton–Raphson refinement step, you can get a somewhat accurate approximation with 3 uops. Latency is probably not too good, since this involves mixing integer and double operations. But it should be much better than divpd. This solution also assumes that all inputs are normalized doubles.

__m256d fastinv(__m256d y)
{
    // exact results for powers of two
    __m256i const magic = _mm256_set1_epi64x(0x7fe0'0000'0000'0000);
    // Bit-magic: For powers of two this just inverts the exponent, 
    // and values between that are linearly interpolated 
    __m256d x = _mm256_castsi256_pd(_mm256_sub_epi64(magic,_mm256_castpd_si256(y)));

    // Newton-Raphson refinement: x = x*(2.0 - x*y):
    x = _mm256_mul_pd(x, _mm256_fnmadd_pd(x, y, _mm256_set1_pd(2.0)));

    return x;
}

With the constants above, the inverse is exact for powers of two, but has an error of ~1.44% near sqrt(2).

If you fine-tune the magic constant as well as the 2.0 constant or add another NR-step, you can increase the accuracy.

Godbolt link: https://godbolt.org/z/f7YhnhT96

Upvotes: 2

Related Questions