Bryce Thomas
Bryce Thomas

Reputation: 10789

Python binary search-like function to find first number in sorted list greater than a specific value

I'm trying to write a function in Python that finds the first number in a sorted list greater than a specific value that I pass in as an argument. I've found examples online that use simple list comprehensions to achieve this, but for my purposes I need to be performing this operation frequently and on large lists, so a search that runs in linear time is too expensive.

I've had a crack at writing an iterative binary search-like function to achieve this, though I'm coming across some edge cases where it doesn't work correctly. By the way, the function is not required to deal with a case where there is no larger item in the list. Here is my existing function:

def findFirstLarger(num, sortedList):
    low = 0; 
    high = len(sortedList) - 1

    mid = -1
    while True:
        print("low: " + str(low) + "\t high: " + str(high))
        if (low > high):
            print("Ah geez, low is " + str(low) + " and high is " + str(high))
            return # debugging, don't want this to happen
        if low == high:
            return sortedList[low]
        else:
            mid = (low + high) / 2;
            if num == sortedList[mid]:
                return sortedList[mid]
            elif num > sortedList[mid]:
                low = mid + 1
            else:
                high = mid - 1

One case I have noted where this function does not work is as follows:

>>> somenumbers=[n*2 for n in range(131072)]
>>> somenumbers[-5:]
[262134, 262136, 262138, 262140, 262142]


>>> binsearch.findFirstLarger(262139,somenumbers)
low: 0   high: 131071
low: 65536   high: 131071
low: 98304   high: 131071
low: 114688  high: 131071
low: 122880  high: 131071
low: 126976  high: 131071
low: 129024  high: 131071
low: 130048  high: 131071
low: 130560  high: 131071
low: 130816  high: 131071
low: 130944  high: 131071
low: 131008  high: 131071
low: 131040  high: 131071
low: 131056  high: 131071
low: 131064  high: 131071
low: 131068  high: 131071
low: 131070  high: 131071
low: 131070  high: 131069
Ah geez, low is 131070 and high is 131069

Here the correct result would be 262140, as this is the first number in the list greater than 262139.

Can anyone recommend a cleaner implementation of this that actually works? I didn't think this would be such an esoteric problem, though I haven't been able to find a solution anywhere as of yet.

Upvotes: 4

Views: 11005

Answers (2)

Ali
Ali

Reputation: 3231

If you need the implementation without bisect function, you can try the following code:

def findFirstLargerOrEqual(num, sortedList):
    '''Finds the smallest index in the sortedList
    of the element which is greater-than or equal to num'''

    slen = len(sortedList)
    start = 0

    while slen > 0:
        m = start + slen//2

        if sortedList[m] < num:
            slen = slen - (m+1 - start)
            start = m+1
            continue

        if start < m and sortedList[m-1] >= num:
            slen = m - start
            continue

        return somenumbers[m]

    raise ValueError('Not found')

somenumbers=[n*2 for n in range(131072)]
print(findFirstLargerOrEqual(262139, somenumbers)) #output: 262140

Upvotes: 0

kennytm
kennytm

Reputation: 523214

Have you tried the bisect module?

def find_ge(a, key):
    '''Find smallest item greater-than or equal to key.
    Raise ValueError if no such item exists.
    If multiple keys are equal, return the leftmost.

    '''
    i = bisect_left(a, key)
    if i == len(a):
        raise ValueError('No item found with key at or above: %r' % (key,))
    return a[i]

find_ge(somenumbers, 262139)

Your code is wrong that (1) low > high is a valid termination case. (2) you should not stop at low == high, e.g. it will return an incorrect index when num == 3 for your somenumbers.

Upvotes: 22

Related Questions