Fast way to "improve" the length of a unit length vector

Question

Paying for a full vector normalization in performance-critical code when it is known that the vector is already almost unit-length seems wasteful.

Does anyone know of a fast, practical method to bring the length of a double-precision 3D vector closer to 1? I'm imagining an iterative method based on Newton-Raphson iterations or limited Taylor expansion around 1.

Here is an actual real-world situation where such a routine could be useful. The incoming vector is already almost unit length, but without an explicit normalization it still triggers assertions down the line.

Using SSE 2, SSE 4.2 or AVX intrinsics is OK.

Nominal Animal · Accepted Answer

The problem at hand boils down to finding (an approximate) reciprocal square root.

SSE and AVX include an approximate reciprocal square root machine instruction, rsqrt, that is particularly well suited for this. Per the original AMD64 Architecture Programmer's Manual, volume 1, the maximum relative error of the reciprocal square root variants is at most 1.5×2^-12, or less than 0.0004.

If you use GCC, you can use the __builtin_ia32_rsqrtss() SSE built-in function to compute the reciprocal square root of the squared length of the vector, and multiply the vector components by the result, to get an "almost unit" vector.

Note that both SSE and AVX provide functions that speed up the calculation of the squared length, as well as multiplying each component. (You'll need to copy the scale factor to a vector of equal size, though.)

Without SSE/AVX, the general problem is that we wish to multiply the vector components by f(S) ≃ sqrt(1/S) == 1/sqrt(S), where S is the inner product (dot product) of the vector and itself, i.e. its length squared; but sqrt() is considered too slow, and S is known to already be close to 1.

Any function f(S) whose value is between 1 and sqrt(1/S), within the range we consider "close to 1", will work. The simplest I can think of are functions of form f(S) = (C + 1 - S) / C. For S = 0.5² to 2² (i.e. for vectors with length between 1/2 and 2), C is 6.

If we did not have any hardware support for reciprocal square root, the first approximation I would try would be along the following lines:

Compute the squared length S of the vector
Compute M = 0.125 * (9 - S)

Note that any constant pair C₁ and C₂ = 1 + 1 / C₁ should work, only the range and rate of convergence varies. I picked C₁ = 1/8 for this example simply because it is exact in IEEE-754 floating-point representation, and typically multiplication is much faster than division. Other values (like 1/6 I mentioned above for range 0.5 to 2) are inexact and may need finessing by hand (adjusting the least significant unit one way or another in the two constants).
Multiply each component of the vector by M.

If that did not yield good enough results, I'd stop worrying about it, and use (hardware) square root instead. (On some architectures, casting the squared length to single precision for computing the scale factor can yield a significant speedup. Not on x86/AMD64, however.)

Fast way to "improve" the length of a unit length vector

Answers (1)

Related Questions

Fast way to &quot;improve&quot; the length of a unit length vector

Answers (1)

Related Questions

Fast way to "improve" the length of a unit length vector