Reputation: 33
I am studying Data structures and Algorithms and I am stuck on the average unsuccessful case of binary search. I could not find it in my book (data structures by Lipschutz) and even on the various resources over internet they all have only stated the formulas without giving any explanations. The formulas are :
Sn= (I+n)/n
and
Un= E/(n+1)
where
Sn= number of comparisons in case of successful search
Un= number of comparisons in case of unsuccessful search
I= internal path length of the binary tree, and
E= external path length of the binary tree.
n is the number of nodes in the binary tree.
Here i want to state that I do know how to get the comparisons in both the cases when an actual binary search case is given by making the binary search tree for it as explained here
But after explaining it with the binary tree they have also simply stated the formula to get a general representation of it.
the first formula for Sn
I completely understood but the formula for Un
is just not clear to me.
Upvotes: 2
Views: 11237
Reputation: 62
It is because the average number of comparisons for unsuccessful search is equal to the 'average external path length' of the tree which is given by the expression E/(n+1) , since there are (n+1) external nodes which represent all the fail cases.
You are comparing it with the average successful case which is causing you the confusion. In the case of successful search number of comparisons is equal to the number of nodes in the path from the root node to that success node. But unsuccessful search can be understood as the insertion of that particular node in the tree. ie: insertion of an element is equivalent to the unsuccessful search for that node. And we know that the number of comparisons to find a node in the tree is exactly 1 more than the number of comparisons needed to insert it.
For example: consider the sorted array: 3,7,10,13,15
its binary search can be represented by the following binary search tree:
10
/ \
/ \
/ \
3 13
/ \ / \
F 7 F 15
/ \ / \
F F F F
where F denotes the fail cases.
Now this shows that if you search for 10 in this array it will take just one comparison, if you search for 3 or 13 both will take 2 comparisons each, similarly if u search for 7 or 15 it will take 3 comparisons each.
Internal path length gives the number of edges from root to a particular internal node so for each node it will be one less than the number of comparisons required to successfully search for that node. Hence we add 1 for each node to the internal path length which leads to a total of n (since there are n internal nodes) so the formula (I+n)/n is justified.
Now coming to the case of unsuccessful search, suppose in this example array you want to search for 2 (which is not present in the array) so the first comparison will be made with the mid element ie 10 , then second comparison with 3 and since the search cannot go lower than 3 our search ends here with only 2 comparisons, unlike what u stated that it would take 3 comparisons since it is on level 2 of the tree. Similarly while searching for 5 in this array you will have to make 3 comparisons and not 4. Hence we see that each fail node requires comparisons equal to its path length. And so we don not have to add an extra 1 for each external node and the average comparisons for unsuccessful search is equal to the average external path length of the binary tree. which justifies the formula E/(n+1).
I guess this should clear your doubt of why we didnt add an extra 1 for each node.
cheers!
Upvotes: 3