Jayram
Jayram

Reputation: 19578

Fixing Binary search bug from Bentley's book (programming pearls: writing correct programs)

Binary search can be implemented in many ways-recursive, iterative, conditionals, etc. I took this from Bentley's book "Programming pearls: Writing correct programs" which is an iterative implementation, and that includes a bug.

 public class BinSearch 
    {
       static int search( int [] A, int K ) {
          int l = 0;
          int u = A. length -1;
          int m;
          while ( l <= u ) {
              m = (l+u) /2;
              if (A[m] < K){
              l = m + 1;
              } else if (A[m] == K){
                    return m;
              } else {
                    u = m-1;
              }
         }
    return -1;
    }
}

I found a bug in the line m = (l+u) /2; it can lead to overflow. How can we avoid this overflow in this binary search?

Upvotes: 16

Views: 4677

Answers (7)

unknown_boundaries
unknown_boundaries

Reputation: 1580

Although Machtl and others have already written the way to overcome the overflow bug but adding just for the sake of completeness

  1. Use int mid = (low + high) >>> 1; faster way
  2. And in C and C++ (where we don't have the >>> operator) mid = ((unsigned int)low + (unsigned int)high)) >> 1;

Upvotes: -1

Srikar Appalaraju
Srikar Appalaraju

Reputation: 73608

The iterative implementation binary search indeed has a bug. change m = (l+u)/2 . As others have mentioned it can lead to integer overflow. Replace that by m = l + (u-l)/2.

From experience I have time and again seen buggy binary search implementations. Binary search although a simple concept involving divide and conquer can be difficult to get right. Its easy to change the above m assignment. Hope this helps...

Upvotes: 1

Gene
Gene

Reputation: 46960

As several others have said, the fix is simple, certainly the simplest I've seen for a 100-point bounty! Here is another, which has a nice symmetry, even if it takes a few more clock cycles:

m = (l >> 1) + (u >> 1) + (l & u & 1);

You should not malign Bentley for "a bug" until you have better information. When he wrote that article for the ACM (some time in the 1980's I think), he was pseudocoding and writing in 32-bit C; machines with gigabytes of RAM didn't exist. Even if they had, with 4-byte ints, a 32-bit machine can't have an array with more than 2^28 ints. Therefore the highest possible index is 2^28-1. Doubling this value does not cause an in int to overflow.

Of course, it's exactly the same with 32-bit Java. You need the broken kludge of 64-bit Java - a language that allows objects of size approaching 2^64 but limits indices to 2^32-1 in order to cause this "bug" to appear.

What you call a bug is a change of operating assumptions. Every program in the universe will manifest some kind of flaw if the environment changes in the right way.

Upvotes: 5

frankyym
frankyym

Reputation: 171

Suppose l and u are int both fall into [0, 2^31-1]. If l, u >= 2^30, then (l+u) >= 2^31 is overflow. To avoid this, use

m = l + (u-l)/2; 

instead. What is more, it may be more reasonable to write binary search like this:

    public class BinSearch 
    {
        static int search( int [] A, int K ) {
            int l = -1;             // index lower bound shift left by 1
            int u = A.length;       // index upper bound shift right by 1
            int m;
            while ( l + 1 < u ) {
                m = l + (u-l)/2;    // avoid overflow
                if (A[m] < K){
                    l = m;          // keep A[l] < K 
                } else {
                    u = m;          // keep A[u] >= K
                }
            }
            if ( (u == A.length) || (A[u] != K) ) return -1;
            return u;
        }
    }

Upvotes: 6

VoidStar
VoidStar

Reputation: 984

Try the following:

change

m = (l+u) /2

to

m = (u-l) / 2 + l

The reason why the (l+u) / 2 can overflow becomes obvious if you consider a very large array of 2^31 - 1 elements (maximum value a signed 32-bit integer can hold). In this case the first iteration is just fine because 2^31 - 1 + 0 is not a big deal but consider the case of l = m + 1 here. In the second iteration u is still the same and l is 2^31 / 2 so l + u will lead to an overflow.

This way we are avoiding the addition of u + l by first determining the relative middle between l and u (u - l) / 2 and then adding the lower number l to it so it becomes absolute. So during the operation m = (u-l) / 2 + l; we never exceed the value of u.

To summarize the complete code:

public class BinSearch 
{
    static int search( int [] A, int K ) 
    {
        int l = 0;
        int u = A. length -1;
        int m;

        while ( l <= u ) 
        {
            m = (u-l) / 2 + l;

            if (A[m] < K)
                l = m + 1;
            else if (A[m] == K)
                return m;
            else
                u = m - 1;
        }
        return -1;
     }
}

Upvotes: 16

Ayberk
Ayberk

Reputation: 410

Try changing

m = (l+u) / 2

to

m = l + (u - l) / 2

It is trivial to see that both is equal and also the second statement prevents the overflow.

Upvotes: 2

陆彦帑
陆彦帑

Reputation: 21

I think you should change u = m-l; to u = m -1.

it's 1 not l. ---------addtion------ cause (l+u) may be greater than 2^31-1 (if the int is 32bits), so it may overflow. so you should change (l+u)/2 to l+((u-l)>>1), and u-l cannot be greater than 2^31-1.

Upvotes: 1

Related Questions