Reputation:

Inverse square root intrinsics

Are there compiler intrinsics for the inverse square root, single and double-precision floating-point, on a scalar argument ?

I can find these for packed SIMD arguments (SSE and AVX) but not for scalars.

Anything faster than division by the <math.h> square-root is welcome as well.

Upvotes: 2

Answers (1)

robthebloke

Reputation: 9682

Here you go....

#include <immintrin.h>

// identical to std::sqrt
inline float sqrt(const float f)
{
    __m128 temp = _mm_set_ss(f);
    temp = _mm_sqrt_ss(temp);
    return _mm_cvtss_f32(temp);
}

// faster than  1.0f/std::sqrt, but with little accuracy.
inline float rsqrt(const float f)
{
    __m128 temp = _mm_set_ss(f);
    temp = _mm_rsqrt_ss(temp);
    return _mm_cvtss_f32(temp);
}

// identical to std::sqrt
inline double sqrt(const double f)
{
    __m128d temp = _mm_set_sd(f);
    temp = _mm_sqrt_sd(temp, temp);
    return _mm_cvtsd_f64(temp);
}

// identical to 1.0 / std::sqrt
// .... there isn't an instruction for rsqrt with double, 
// so 1.0 / std::sqrt is the best you've got. 
inline double rsqrt(const double f)
{
    __m128d temp = _mm_set_sd(f);
    temp = _mm_div_sd(_mm_set_sd(1.0), _mm_sqrt_sd(temp, temp));
    return _mm_cvtsd_f64(temp);
}

Comparison against std::sqrt(): https://godbolt.org/z/uufv3W

If you enable -ffast-math (or float precision fast, in MSVC) then std::sqrt is likely to generate the same code as using the intrinsics anyway. The only exception is rsqrt for float (clang will convert 1/sqrt to an rsqrt + newton-raphson iteration).

Obviously rsqrt comes with some very nasty floating error. It may be ok for say, normalising a bunch of surface normals for OpenGL rendering, but for almost everything else, the lack of accuracy makes it mostly useless. (e.g. the quadratic formula)

Any 'Quake3 optimised rsqrt' nonsense you see around, will be many orders of magnitude slower than calling std::sqrt directly, but with terrible accuracy.

At least once a month or so, when working in the games industry, some new guy would try to 'optimise' the code by replacing std::sqrt with it. sigh

TL;DR: If you have fast math enabled, just use std::sqrt. If fast-math is disabled, then the C++ standard dictates error codes to be set for errno, which will force the compiler to use the std library version.

Upvotes: 6

Inverse square root intrinsics

Answers (1)

Related Questions