Reputation:
Good morning, Instead of always using the middle of the list to search (binary search), you can estimate the position of the item to be searched based on the max value (last item), min value (first item) and the search value.
((Assume uniform distribution of the items and the items are sorted.)) This is my code below , do ya"ll have any suggested codes?
def binary_search(seq,item):
"""It uses non recursive method to search the item in the given seq.
It returns the position of item if found, None otherwise"""
left_index=0
right_index=len(seq)-1
while left_index <= right_index: #stop searching when left_index > right_indext
mid_index=(right_index + left_index)//2 #find the mid point
if seq[mid_index]==item:
return mid_index
elif seq[mid_index]>item:
right_index = mid_index -1 #if mid point ele > search ele, move right pointer
else:
left_index = mid_index + 1 #if mid point ele < search ele, move left pointer
return None
a=[1,2,3,4,5,6,7,8,9]
print(binary_search(a,6))
Upvotes: 0
Views: 548
Reputation: 350077
The algorithm you are hinting at is called interpolation search. You can implement it by changing the formula for getting mid_index
as follows:
mid_index = left_index + round(
(item - seq[left_index]) / (seq[right_index] - seq[left_index])
* (right_index - left_index)
)
You should however protect against an out of range reference, as the searched item could well be greater than the maximum in the list (or less than the minimum), and in that case the above index could point beyond the length of the list or be negative.
So before entering the loop do:
if len(seq) == 0 or item < seq[0] or item > seq[-1]: return None
Alternatively, you could put an extra test in the loop to exit when the item
value is no longer between seq[left_index]
and seq[right_index]
.
Upvotes: 0
Reputation: 881153
Assume uniform distribution of the items and the items are sorted.
Putting aside the fact that a binary search requires sorted data regardless of where you choose the split point, the ability to make assumptions about the data means that optimisations like this are possible.
In fact, if you assume the data is always the unique numbers 1..100
with no gaps, you can make it even faster :-)
That won't really help with the general case of course, which you'll see if you run your algorithm over the data set { 1, 2, 3, 4, 5, ..., 100, 99999999 }
, looking for 100
.
Your algorithm would expect to find that very early on in the array rather than at the penultimate index.
The ability to assume data set properties has been used successfully in many situations. For example, with English surnames in a hash lookup, you may give names starting with E
their own bucket while lumping those starting with Z
, X
, V
and Y
into a single bucket for them all (assuming that the names starting with E
are far more common than those others).
Upvotes: 1