Z boson
Z boson

Reputation: 33659

accuracy of sqrt of integers

I have a loop like this:

for(uint64_t i=0; i*i<n; i++) {

This requires doing a multiplication every iteration. If I could calculate the sqrt before the loop then I could avoid this.

unsigned cut = sqrt(n)
for(uint64_t i=0; i<cut; i++) {

In my case it's okay if the sqrt function rounds up to the next integer but it's not okay if it rounds down.

My question is: is the sqrt function accurate enough to do this for all cases?

Edit: Let me list some cases. If n is a perfect square so that n = y^2 my question would be - is cut=sqrt(n)>=y for all n? If cut=y-1 then there is a problem. E.g. if n = 120 and cut = 10 it's okay but if n=121 (11^2) and cut is still 10 then it won't work.

My first concern was the fractional part of float only has 23 bits and double 52 so they can't store all the digits of some 32-bit or 64-bit integers. However, I don't think this is a problem. Let's assume we want the sqrt of some number y but we can't store all the digits of y. If we let the fraction of y we can store be x we can write y = x + dx then we want to make sure that whatever dx we choose does not move us to the next integer.

sqrt(x+dx) < sqrt(x) + 1  //solve
dx < 2*sqrt(x) + 1 
// e.g for x = 100 dx < 21
// sqrt(100+20) < sqrt(100) + 1

Float can store 23 bits so we let y = 2^23 + 2^9. This is more than sufficient since 2^9 < 2*sqrt(2^23) + 1. It's easy to show this for double as well with 64-bit integers. So although they can't store all the digits as long as the sqrt of what they can store is accurate then the sqrt(fraction) should be sufficient. Now let's look at what happens for integers close to INT_MAX and the sqrt:

unsigned xi = -1-1;
printf("%u %u\n", xi, (unsigned)(float)xi);  //4294967294 4294967295
printf("%u %u\n", (unsigned)sqrt(xi), (unsigned)sqrtf(xi));  //65535 65536

Since float can't store all the digits of 2^31-2 and double can they get different results for the sqrt. But the float version of the sqrt is one integer larger. This is what I want. For 64-bit integers as long as the sqrt of the double always rounds up it's okay.

Upvotes: 2

Views: 1742

Answers (3)

John Z. Li
John Z. Li

Reputation: 1995

What you are looking for is a way to calculate a rational upper bound of the square root of a natural number. Continued fraction is what you need see wikipedia.

For x>0, there is square root formula.

To make the notation more compact, rewriting the above formula as

compact square root

Truncate the continued fraction by removing the tail term (x-1)/2's at each recursion depth, one gets a sequence of approximations of sqrt(x) as below:

upper bounds

Upper bounds appear at lines with odd line numbers, and gets tighter. When distance between an upper bound and its neighboring lower bound is less than 1, that approximation is what you need. Using that value as the value of cut, here cut must be a float number, solves the problem.

For very large number, rational number should be used, so no precision is lost during conversion between integer and floating point number.

Upvotes: 0

aka.nice
aka.nice

Reputation: 9362

All my answer is useless if you have access to IEEE 754 double precision floating point, since Stephen Canon demonstrated both

  • a simple way to avoid imul in loop
  • a simple way to compute the ceiling sqrt

Otherwise, if for some reason you have a non IEEE 754 compliant platform, or only single precision, you could get the integer part of square root with a simple Newton-Raphson loop. For example in Squeak Smalltalk we have this method in Integer:

sqrtFloor
    "Return the integer part of the square root of self"

    | guess delta |
    guess := 1 bitShift: (self highBit + 1) // 2.
    [
        delta := (guess squared - self) // (guess + guess).
        delta = 0 ] whileFalse: [
            guess := guess - delta ].
    ^guess - 1

Where // is operator for quotient of integer division.
Final guard guess*guess <= self ifTrue: [^guess]. can be avoided if initial guess is fed in excess of exact solution as is the case here.
Initializing with approximate float sqrt was not an option because integers are arbitrarily large and might overflow

But here, you could seed the initial guess with floating point sqrt approximation, and my bet is that the exact solution will be found in very few loops. In C that would be:

uint32_t sqrtFloor(uint64_t n)
{
    int64_t diff;
    int64_t delta;
    uint64_t guess=sqrt(n); /* implicit conversions here... */
    while( (delta = (diff=guess*guess-n) / (guess+guess)) != 0 )
        guess -= delta;
    return guess-(diff>0);
}

That's a few integer multiplications and divisions, but outside the main loop.

Upvotes: 3

Stephen Canon
Stephen Canon

Reputation: 106117

First, integer multiplication is really quite cheap. So long as you have more than a few cycles of work per loop iteration and one spare execute slot, it should be entirely hidden by reorder on most non-tiny processors.

If you did have a processor with dramatically slow integer multiply, a truly clever compiler might transform your loop to:

for (uint64_t i = 0, j = 0; j < cut; j += 2*i+1, i++)

replacing the multiply with an lea or a shift and two adds.


Those notes aside, let’s look at your question as stated. No, you can’t just use i < sqrt(n). Counter-example: n = 0x20000000000000. Assuming adherence to IEEE-754, you will have cut = 0x5a82799, and cut*cut is 0x1ffffff8eff971.

However, a basic floating-point error analysis shows that the error in computing sqrt(n) (before conversion to integer) is bounded by 3/4 of an ULP. So you can safely use:

uint32_t cut = sqrt(n) + 1;

and you’ll perform at most one extra loop iteration, which is probably acceptable. If you want to be totally precise, instead use:

uint32_t cut = sqrt(n);
cut += (uint64_t)cut*cut < n;

Edit: z boson clarifies that for his purposes, this only matters when n is an exact square (otherwise, getting a value of cut that is “too small by one” is acceptable). In that case, there is no need for the adjustment and on can safely just use:

uint32_t cut = sqrt(n);

Why is this true? It’s pretty simple to see, actually. Converting n to double introduces a perturbation:

double_n = n*(1 + e)

which satisfies |e| < 2^-53. The mathematical square root of this value can be expanded as follows:

square_root(double_n) = square_root(n)*square_root(1+e)

Now, since n is assumed to be a perfect square with at most 64 bits, square_root(n) is an exact integer with at most 32 bits, and is the mathematically precise value that we hope to compute. To analyze the square_root(1+e) term, use a taylor series about 1:

square_root(1+e) = 1 + e/2 + O(e^2)
                 = 1 + d with |d| <~ 2^-54

Thus, the mathematically exact value square_root(double_n) is less than half an ULP away from[1] the desired exact answer, and necessarily rounds to that value.

[1] I’m being fast and loose here in my abuse of relative error estimates, where the relative size of an ULP actually varies across a binade — I’m trying to give a bit of the flavor of the proof without getting too bogged down in details. This can all be made perfectly rigorous, it just gets to be a bit wordy for Stack Overflow.

Upvotes: 7

Related Questions