Veridian
Veridian

Reputation: 3667

Determining Floating Point Square Root

How do I determine the square root of a floating point number? Is the Newton-Raphson method a good way? I have no hardware square root either. I also have no hardware divide (but I have implemented floating point divide).

If possible, I would prefer to reduce the number of divides as much as possible since they are so expensive.

Also, what should be the initial guess to reduce the total number of iterations???

Thank you so much!

Upvotes: 10

Views: 13815

Answers (4)

ElKamina
ElKamina

Reputation: 7807

Easiest to implement (you can even implement this in a calculator):

def sqrt(x, TOL=0.000001):
    y=1.0
    while( abs(x/y -y) > TOL ):
        y= (y+x/y)/2.0
    return y

This is exactly equal to newton raphson:

y(new) = y - f(y)/f'(y)

f(y) = y^2-x and f'(y) = 2y

Substituting these values:

y(new) = y - (y^2-x)/2y = (y^2+x)/2y = (y+x/y)/2

If division is expensive you should consider: http://en.wikipedia.org/wiki/Shifting_nth-root_algorithm .

Shifting algorithms:

Let us assume you have two numbers a and b such that least significant digit (equal to 1) is larger than b and b has only one bit equal to (eg. a=1000 and b=10). Let s(b) = log_2(b) (which is just the location of bit valued 1 in b).

Assume we already know the value of a^2. Now (a+b)^2 = a^2 + 2ab + b^2. a^2 is already known, 2ab: shift a by s(b)+1, b^2: shift b by s(b).

Algorithm:

Initialize a such that a has only one bit equal to one and a^2<= n < (2*a)^2. 
Let q=s(a).    
b=a
sqra = a*a

For i = q-1 to -10 (or whatever significance you want):
    b=b/2
    sqrab = sqra + 2ab + b^2
    if sqrab > n:
        continue
    sqra = sqrab
    a=a+b

n=612
a=10000 (16)

sqra = 256

Iteration 1:
    b=01000 (8) 
    sqrab = (a+b)^2 = 24^2 = 576
    sqrab < n => a=a+b = 24

Iteration 2:
    b = 4
    sqrab = (a+b)^2 = 28^2 = 784
    sqrab > n => a=a

Iteration 3:
    b = 2
    sqrab = (a+b)^2 = 26^2 = 676
    sqrab > n => a=a

Iteration 4:
    b = 1
    sqrab = (a+b)^2 = 25^2 = 625
    sqrab > n => a=a

Iteration 5:
    b = 0.5
    sqrab = (a+b)^2 = 24.5^2 = 600.25
    sqrab < n => a=a+b = 24.5

Iteration 6:
    b = 0.25
    sqrab = (a+b)^2 = 24.75^2 = 612.5625
    sqrab < n => a=a


Iteration 7:
    b = 0.125
    sqrab = (a+b)^2 = 24.625^2 = 606.390625
    sqrab < n => a=a+b = 24.625

and so on.

Upvotes: 2

Stephen Canon
Stephen Canon

Reputation: 106127

When you use Newton-Raphson to compute a square-root, you actually want to use the iteration to find the reciprocal square root (after which you can simply multiply by the input--with some care for rounding--to produce the square root).

More precisely: we use the function f(x) = x^-2 - n. Clearly, if f(x) = 0, then x = 1/sqrt(n). This gives rise to the newton iteration:

x_(i+1) = x_i - f(x_i)/f'(x_i)
        = x_i - (x_i^-2 - n)/(-2x_i^-3)
        = x_i + (x_i - nx_i^3)/2
        = x_i*(3/2 - 1/2 nx_i^2)

Note that (unlike the iteration for the square root), this iteration for the reciprocal square root involves no divisions, so it is generally much more efficient.

I mentioned in your question on divide that you should look at existing soft-float libraries, rather than re-inventing the wheel. That advice applies here as well. This function has already been implemented in existing soft-float libraries.


Edit: the questioner seems to still be confused, so let's work an example: sqrt(612). 612 is 1.1953125 x 2^9 (or b1.0011001 x 2^9, if you prefer binary). Pull out the even portion of the exponent (9) to write the input as f * 2^(2m), where m is an integer and f is in the range [1,4). Then we will have:

sqrt(n) = sqrt(f * 2^2m) = sqrt(f)*2^m

applying this reduction to our example gives f = 1.1953125 * 2 = 2.390625 (b10.011001) and m = 4. Now do a newton-raphson iteration to find x = 1/sqrt(f), using a starting guess of 0.5 (as I noted in a comment, this guess converges for all f, but you can do significantly better using a linear approximation as an initial guess):

x_0 = 0.5
x_1 = x_0*(3/2 - 1/2 * 2.390625 * x_0^2)
    = 0.6005859...
x_2 = x_1*(3/2 - 1/2 * 2.390625 * x_1^2)
    = 0.6419342...
x_3 = 0.6467077...
x_4 = 0.6467616...

So even with a (relatively bad) initial guess, we get rapid convergence to the true value of 1/sqrt(f) = 0.6467616600226026.

Now we simply assemble the final result:

sqrt(f) = x_n * f = 1.5461646...
sqrt(n) = sqrt(f) * 2^m = 24.738633...

And check: sqrt(612) = 24.738633...

Obviously, if you want correct rounding, careful analysis needed to ensure that you carry sufficient precision at each stage of the computation. This requires careful bookkeeping, but it isn't rocket science. You simply keep careful error bounds and propagate them through the algorithm.

If you want to correct rounding without explicitly checking a residual, you need to compute sqrt(f) to a precision of 2p + 2 bits (where p is precision of the source and destination type). However, you can also take the strategy of computing sqrt(f) to a little more than p bits, square that value, and adjust the trailing bit by one if necessary (which is often cheaper).

sqrt is nice in that it is a unary function, which makes exhaustive testing for single-precision feasible on commodity hardware.

You can find the OS X soft-float sqrtf function on opensource.apple.com, which uses the algorithm described above (I wrote it, as it happens). It is licensed under the APSL, which may or not be suitable for your needs.

Upvotes: 13

emboss
emboss

Reputation: 39620

Probably (still) the fastest implementation for finding the inverse square root and the 10 lines of code that I adore the most.

It's based on Newton Approximation, but with a few quirks. There's even a great story around this.

Upvotes: 4

Dan Piponi
Dan Piponi

Reputation: 8116

A good approximation to square root on the range [1,4) is

def sqrt(x):
  y = x*-0.000267
  y = x*(0.004686+y)
  y = x*(-0.034810+y)
  y = x*(0.144780+y)
  y = x*(-0.387893+y)
  y = x*(0.958108+y)
  return y+0.315413

Normalise your floating point number so the mantissa is in the range [1,4), use the above algorithm on it, and then divide the exponent by 2. No floating point divisions anywhere.

With the same CPU time budget you can probably do much better, but that seems like a good starting point.

Upvotes: 1

Related Questions