Intrepid Diamond
Intrepid Diamond

Reputation: 482

Explain the difference between these Midpoint Algorithms

Why does the midpoint algorithm for Binary Search use

low + (high-low)/2

rather than

(low + high)/2

Upvotes: 8

Views: 525

Answers (2)

en_Knight
en_Knight

Reputation: 5381

Your question is tagged for python, so I'll answer for python. In short, it doesn't:

https://hg.python.org/cpython/file/2.7/Lib/bisect.py

The pythonic implementation above found in the docs uses the latter construction. As people in the comments have pointed out, some languages need to respect overflow. Python isn't none of them and has arbitrary precision integers.

In the comments it was speculated that someone porting from a C-like language might copy the more acceptable construction for that language. This is possible. Someone else commented that one might be faster than the other; such a micro-optimization seems to be difficult to comment on in general.

But... what if they aren't Ints!

I have assumed that these are integers because for binary search, the indices are integers. If they are indeed not integers, then you are going to have some problems using them to access arrays. But in the mean time, you might experiene different results:

a = b = sys.float_info.max
print a + (a-b)/2 # prints a really big number
print (a+b)/2 # prints inf

Similarly,

a = b = float("inf")
print a+(a-b)/2 # prints nan
print (a+b)/2 # prints inf

That latter example is different, albeit it isn't clear to me which is better. For why this occurs, you can look at the overflow explanations in the article linked above.

Upvotes: 4

Piyush Gupta
Piyush Gupta

Reputation: 2179

I searched this question on google and found very interesting answer on http://googleresearch.blogspot.in/2006/06/extra-extra-read-all-about-it-nearly.html

Here is example:

1:     public static int binarySearch(int[] a, int key) {
2:         int low = 0;
3:         int high = a.length - 1;
4:
5:         while (low <= high) {
6:             int mid = (low + high) / 2;
7:             int midVal = a[mid];
8:
9:             if (midVal < key)
10:                 low = mid + 1
11:             else if (midVal > key)
12:                 high = mid - 1;
13:             else
14:                 return mid; // key found
15:         }
16:         return -(low + 1);  // key not found.
17:     }

The bug is in this line:

int mid =(low + high) / 2;

In Programming Pearls Bentley says that the analogous line "sets m to the average of l and u, truncated down to the nearest integer." On the face of it, this assertion might appear correct, but it fails for large values of the int variables low and high. Specifically, it fails if the sum of low and high is greater than the maximum positive int value (231 - 1). The sum overflows to a negative value, and the value stays negative when divided by two. In C this causes an array index out of bounds with unpredictable results. In Java, it throws ArrayIndexOutOfBoundsException.

This bug can manifest itself for arrays whose length (in elements) is 230 or greater (roughly a billion elements). This was inconceivable back in the '80s, when Programming Pearls was written, but it is common these days at Google and other places. In Programming Pearls, Bentley says "While the first binary search was published in 1946, the first binary search that works correctly for all values of n did not appear until 1962." The truth is, very few correct versions have ever been published, at least in mainstream programming languages.

So what's the best way to fix the bug? Here's one way:

 int mid = low + ((high - low) / 2);

Upvotes: 0

Related Questions