Why is using upper bound value of middle expression in binary search wrong?

Question

In a typical binary search algorithm (e.g. in Java), we find the selection of the middle element using the floor rather than the ceiling of the division:

public static void binarySearch(int[] array, int lowerbound, int upperbound, int key)
{
  int comparisonCount = 0;    // counting the number of comparisons (optional)        
  while (lowerbound <= upperbound)
  {
    final int position = (lowerbound + upperbound) / 2;
    ++comparisonCount;
    if (array[position] == key)
    {
      System.out.println("The number was found in array subscript " + position + '.');
      System.out.println("The binary search found the number after " + comparisonCount +
                         " comparisons.");
      return;
    }
    if (array[position] > key)             // If the number is > key, ..
    {                                      // decrease position by one. 
      upperbound = position - 1;   
    }                                                             
    else                                                   
    {                                                        
      lowerbound = position + 1;           // Else, increase position by one. 
    }
  }
  System.out.println("Sorry, the number is not in this array. The binary search made "
                     + comparisonCount  + " comparisons.");
}

The formula for position here uses integer division which rounds down (e.g. 3.5 goes to 3). My professor said the alternative of rounding it up will cause a bug. If so, what is it? Why is it necessary that the value is rounded down instead of up?

seh · Accepted Answer

Say that there was only one element in the sequence to be searched. Your lower bound would be zero, and the upper bound one, as the bounds denote a half-open range. This has the property that subtracting the lower bound from the upper bound yields the length of the sequence.

If the lower bound is zero and the upper bound is one, and you evaluate the expression

(ceiling (+ lower-bound upper-bound) 2)

you get one, which is not a valid index of the lone element in this sequence. The only valid index is zero, which the more conventional expression

(floor (+ lower-bound upper-bound) 2)

would yield.

If you had three elements, where the middle element obviously has an index of one, again consider that

(ceiling (+ 0 3) 2)

equals two, which is the the index of the last element, not the middle element.

We can further explore your question by asking whether such a poor choice of a middle element would still eventually yield a correct answer. The algorithm will still wander in the right direction toward the proper element (or place where such an element would belong, if absent from the sequence), but the algorithm would fail when the considered subsequence shrinks to just one element—again, because it will mistakenly select an index one greater than the remaining element.

Why is using upper bound value of middle expression in binary search wrong?

Answers (1)

Related Questions