Could this recursive binary search algorithm be more efficient?

Question

I've seen a lot of different implementations of this algorithm, but I'm wondering if there are ways to improve the efficiency beyond just making the search binary. I've designed this particular version of the algorithm so the edges and midpoint of an array/list will be checked immediately for the key being searched for, as to avoid looping through a search when the key your looking for is just the first, middle, or last element.

def searchRB(the_array, the_key, imin, imax):
    print("searching")
    found = False
    if (0 > the_key or the_key > len(the_array)):
        return found
    else:
        imid = imin + ((imax - imin) // 2)
        if imid == the_key or imin == the_key or imax == the_key:
            found = True
            return found
        elif the_array[imid] > the_key:
            return searchRB(the_array, the_key, imin, imid-1)
        elif the_array[imid] < the_key:
            return searchRB(the_array, the_key, imid+1, imax)
        else:
            return found

For example, if your were looking for the number 1 in a list of 1-100, this would find it on the first loop, unlike some other implementations.

However, I'm not sure if this actually improves efficiency at all (except for certain edge cases), and if checking the first, mid, and end values in the list/array is actually detrimental once you continue to loop and have to check those three values every time.

Is this is good or bad implementation of this type of algorithm, or am I just splitting hairs?

tt_Gantz · Accepted Answer

The main big one is changing from recursive approach to using a while loop, saves having a call stack (since python doesn't have tail recursion).

You have small redundancies that can be optimised. Algorithm already optimised enough, don't over optimise unless you understand the compiler

If you're going down the tree on the left you'll be comparing the same imin over and over, however this whole line might be parallelised or done sequentially

if the_array[imid] == the_key or the_array[min] == the_key or the_array[imax] == the_key:

Also this could mess with cache performance as you will be have the_array[min] always being kept in cache. Sometimes compilers store a chunk from an array around the index you're trying to access in cache. You might be wasting even more cache than just for that 1 value.

Also statements like this could be optimized , you could just type return True , but again this should be picked up by the compiler.

found = True return found

Not having found as an object would optimise the code because that object wouldn't be stored in memory the whole time.

This else statement seems redundant as there's no possible way to get to that else else return found

Actual relevant optimisations will come from knowing more about the dataset.

If you are able to preprocess the data (or have more information about the data) you can do other algorithms.

Could this recursive binary search algorithm be more efficient?

Answers (1)

Related Questions