Reputation: 8951
I've seen a lot of different implementations of this algorithm, but I'm wondering if there are ways to improve the efficiency beyond just making the search binary. I've designed this particular version of the algorithm so the edges and midpoint of an array/list will be checked immediately for the key being searched for, as to avoid looping through a search when the key your looking for is just the first, middle, or last element.
def searchRB(the_array, the_key, imin, imax):
print("searching")
found = False
if (0 > the_key or the_key > len(the_array)):
return found
else:
imid = imin + ((imax - imin) // 2)
if imid == the_key or imin == the_key or imax == the_key:
found = True
return found
elif the_array[imid] > the_key:
return searchRB(the_array, the_key, imin, imid-1)
elif the_array[imid] < the_key:
return searchRB(the_array, the_key, imid+1, imax)
else:
return found
For example, if your were looking for the number 1 in a list of 1-100, this would find it on the first loop, unlike some other implementations.
However, I'm not sure if this actually improves efficiency at all (except for certain edge cases), and if checking the first, mid, and end values in the list/array is actually detrimental once you continue to loop and have to check those three values every time.
Is this is good or bad implementation of this type of algorithm, or am I just splitting hairs?
Upvotes: 0
Views: 298
Reputation: 2936
The main big one is changing from recursive approach to using a while loop, saves having a call stack (since python doesn't have tail recursion).
You have small redundancies that can be optimised. Algorithm already optimised enough, don't over optimise unless you understand the compiler
If you're going down the tree on the left you'll be comparing the same imin over and over, however this whole line might be parallelised or done sequentially
if the_array[imid] == the_key or the_array[min] == the_key or the_array[imax] == the_key:
Also this could mess with cache performance as you will be have the_array[min] always being kept in cache. Sometimes compilers store a chunk from an array around the index you're trying to access in cache. You might be wasting even more cache than just for that 1 value.
Also statements like this could be optimized , you could just type return True , but again this should be picked up by the compiler.
found = True
return found
Not having found
as an object would optimise the code because that object wouldn't be stored in memory the whole time.
This else statement seems redundant as there's no possible way to get to that else
else
return found
Actual relevant optimisations will come from knowing more about the dataset.
If you are able to preprocess the data (or have more information about the data) you can do other algorithms.
Upvotes: 2