user11554726
user11554726

Reputation:

Modify Binary Search Function

Good morning, Instead of always using the middle of the list to search (binary search), you can estimate the position of the item to be searched based on the max value (last item), min value (first item) and the search value.

((Assume uniform distribution of the items and the items are sorted.)) This is my code below , do ya"ll have any suggested codes?

def binary_search(seq,item):
"""It uses non recursive method to search the item in the given seq. 
   It returns the position of item if found, None otherwise"""

left_index=0
right_index=len(seq)-1
while left_index <= right_index:            #stop searching when left_index > right_indext
    mid_index=(right_index + left_index)//2 #find the mid point
    if seq[mid_index]==item:
        return mid_index
    elif seq[mid_index]>item:
        right_index = mid_index -1          #if mid point ele > search ele, move right pointer
    else:  
        left_index = mid_index + 1          #if mid point ele < search ele, move left pointer
return None
a=[1,2,3,4,5,6,7,8,9]
print(binary_search(a,6))

Upvotes: 0

Views: 548

Answers (2)

trincot
trincot

Reputation: 350077

The algorithm you are hinting at is called interpolation search. You can implement it by changing the formula for getting mid_index as follows:

mid_index = left_index + round(
    (item - seq[left_index]) / (seq[right_index] - seq[left_index]) 
    * (right_index - left_index)
)

You should however protect against an out of range reference, as the searched item could well be greater than the maximum in the list (or less than the minimum), and in that case the above index could point beyond the length of the list or be negative.

So before entering the loop do:

if len(seq) == 0 or item < seq[0] or item > seq[-1]: return None

Alternatively, you could put an extra test in the loop to exit when the item value is no longer between seq[left_index] and seq[right_index].

Upvotes: 0

paxdiablo
paxdiablo

Reputation: 881153

Assume uniform distribution of the items and the items are sorted.

Putting aside the fact that a binary search requires sorted data regardless of where you choose the split point, the ability to make assumptions about the data means that optimisations like this are possible.

In fact, if you assume the data is always the unique numbers 1..100 with no gaps, you can make it even faster :-)

That won't really help with the general case of course, which you'll see if you run your algorithm over the data set { 1, 2, 3, 4, 5, ..., 100, 99999999 }, looking for 100.

Your algorithm would expect to find that very early on in the array rather than at the penultimate index.


The ability to assume data set properties has been used successfully in many situations. For example, with English surnames in a hash lookup, you may give names starting with E their own bucket while lumping those starting with Z, X, V and Y into a single bucket for them all (assuming that the names starting with E are far more common than those others).

Upvotes: 1

Related Questions