Binary search - worst/avg case

I'm finding it difficult to understand why/how the worst and average case for searching for a key in an array/list using binary search is O(log(n)).

log(1,000,000) is only 6. log(1,000,000,000) is only 9 - I get that, but I don't understand the explanation. If one did not test it, how do we know that the avg/worst case is actually log(n)?

I hope you guys understand what I'm trying to say. If not, please let me know and I'll try to explain it differently.

Upvotes: 2

Answers (4)

Tanmoy Banerjee

Reputation: 1453

For Binary search the prerequisite is a sorted array as input.

• As the list is sorted:
• Certainly we don't have to check every word in the dictionary to look up a word.
• A basic strategy is to repeatedly halve our search range until we find the value.
• For example, look for 5 in the list of 9 #s below.v = 1 1 3 5 8 10 18 33 42
• We would first start in the middle: 8
• Since 5<8, we know we can look at just the first half: 1 1 3 5
• Looking at the middle # again, narrow down to 3 5
• Then we stop when we're down to one #: 5
How many comparison is needed: 4 =log(base 2)(9-1)=O(log(base2)n)

int binary_search (vector<int> v, int val) {
 int from = 0;
 int to = v.size()-1;
 int mid;
 while (from <= to) {
  mid = (from+to)/2;
  if (val == v[mid])
    return mid;
  else if (val > v[mid])
    from = mid+1;
  else
    to = mid-1;
  }
 return -1;
}

Upvotes: 0

antinome

Reputation: 3458

Worst case

Every time the binary search code makes a decision, it eliminates half of the remaining elements from consideration. So you're dividing the number of elements by 2 with each decision.

How many times can you divide by 2 before you are down to only a single element? If n is the starting number of elements and x is the number of times you divide by 2, we can write this as:

n / (2 * 2 * 2 * ... * 2) = 1 [the '2' is repeated x times]

or, equivalently,

n / 2^x = 1

or, equivalently,

n = 2^x

So log base 2 of n gives you x, which is the number of decisions being made.

Finally, you might ask, if I used log base 2, why is it also OK to write it as log base 10, as you have done? The base does not matter because the difference is only a constant factor which is "ignored" by Big O notation.

Average case

I see that you also asked about the average case. Consider:

There is only one element in the array that can be found on the first try.
There are only two elements that can be found on the second try. (Because after the first try, we chose either the right half or the left half.)
There are only four elements that can be found on the third try.

You can see the pattern: 1, 2, 4, 8, ... , n/2. To express the same pattern going in the other direction:

Half the elements take the maximum number of decisions to find.
A quarter of the elements take one fewer decision to find.
etc.

Since half of the elements take the maximum amount of time, it doesn't matter how much less time the other elements take. We could assume that all elements take the maximum amount of time, and even if half of them actually take 0 time, our assumption would not be more than double whatever the true average is. We can ignore "double" since it is a constant factor. So the average case is the same as the worst case, as far as Big O notation is concerned.

Upvotes: 4

Corey Rowell

Reputation: 282

For sorted lists which we can do a binary search, each "decision" made by the binary search compares your key to the middle element, if greater it takes the right half of the list, if less it will take the left half of the list (if it's a match it will return the element at that position) you effectively reduce your list by half for every decision yielding O(logn).

Binary search however, only works for sorted lists. For un-sorted lists you can do a straight search starting with the first element yielding a complexity of O(n).

O(logn) < O(n)

Although it entirely depends on how many searches you'll be doing, your inputs, etc what your best approach would be.

Upvotes: 0

Am_I_Helpful

Reputation: 19178

For binary search, the array should be arranged in ascending or descending order.

In each step, the algorithm compares the search key value with the key value of the middle element of the array.
If the keys match, then a matching element has been found and its index, or position, is returned.
Otherwise, if the search key is less than the middle element's key, then the algorithm repeats its action on the sub-array to the left of the middle element.
Or, if the search key is greater,then the algorithm repeats its action on the sub-array to the right.
If the remaining array to be searched is empty, then the key cannot be found in the array and a special "not found" indication is returned.

So, a binary search is a dichotomic divide and conquer search algorithm. Thereby it takes logarithmic time for performing the search operation as the elements are reduced by half in each of the iteration.

Upvotes: 1

Binary search - worst/avg case

Answers (4)

Related Questions