Genadi
Genadi

Reputation: 249

Find element in a sorted array

Given a sorted array A and an element x, I need to find an algorithm that returns the index of x in A or -1 if x is not in A. the time complexity of the algorithm should be Θ(logd) when d is the number of elements that appears before x in A, or if x is not in A, d is the number of elements that were before x if he was in A.

Binary search is not good enough because its best case is O(1). I thought of starting from the beginning of the array, and start checking the indexes that are powers of 2. but I got lost.

Upvotes: 1

Views: 10180

Answers (3)

Yerbol Sapar
Yerbol Sapar

Reputation: 41

If to compare different methods to find value in 1d sorted array, there is no strong winner for array sizes =<10000, each method has its own 'cons' depending on target position, array size and numpy/list option. However, for large numpy arrays(>1M) it is clear that bisect and searchsort are winners.

There are 5 methods bellow: imported bisect with verification, python loop, numba solution tal's answer at Numpy: find first index of value fast, User_Targaryen's answer above, np.searchsort with verification. All of them accept numpy/lists, except tal's answer, where the to-numpy-conversion is required for lists. Only bisect and find_index methods feels fine resisting to lists len increase.

Bellow is an example for 10K numpy array:

import numpy as np
from numba import jit
from timeit import timeit
import bisect
   
# bisect.bisect
def bisect_(nums,x):
    pos = bisect.bisect_left(nums, x)
    try: 
        if nums[pos] == x:
            return pos
        else:  # e.g., bisect_([1,3,4],2)
            return -1
    except:  # e.g. bisect_([1,3,4],5) 
        return -1


# python loop
def find_first(vec, item):
    for i, v in enumerate(vec):
        if item == v:
            return i
    return -1

# tal's answer at https://stackoverflow.com/questions/7632963/numpy-find-first-index-of-value-fast
@jit(nopython=True)
def find_firstj(vec, item):
    """return the index of the first occurence of item in vec"""
    for i, v in enumerate(vec):
        if item == v:
            return i
    return -1

# User_Targaryen's answer above
def binary_search(nums,low,high,x):
  while low<=high:
    mid = (low+high)//2
    if nums[mid]==x:
      return mid+1
    elif nums[mid]>x:
      high = mid-1
    else:
      low = mid+1
  return -1

def find_index(nums,x):
  i = 1
  l = len(nums)
  while i<l:
    if nums[i-1]==x:
      return i
    elif 2*i<l and nums[2*i-1]>x:
      return binary_search(nums,i-1,2*i-1,x)
    i = 2*i
  return binary_search(nums,i//2,l-1,x)

# numpy searchsort
def searchsort(u, x):
    pos=np.searchsorted(u, x)
    try: 
        if u[pos] == x:
            return pos
        else:  # e.g., bisect_([1,3,4],2)
            return -1
    except:  # e.g. bisect_([1,3,4],5)
        return -1

SIZE = 10000
u = np.arange(SIZE)

# at the beginning
print('---- at the beginning ----')
x = 40

print('bisect.bisect', timeit(lambda: bisect_(u, x), number=100)*10, 's')
print('find_first', timeit(lambda: find_first(u,x), number=100)*10, 's')
print('find_firstj', timeit(lambda: find_firstj(u,x), number=100)*10, 's')
print('find_index', timeit(lambda: find_index(u, x), number=100)*10, 's')
print('searchsort', timeit(lambda: searchsort(u, x), number=100)*10, 's')

# in the middle
print('---- in the middle ----')
x = 5040

print('bisect.bisect', timeit(lambda: bisect.bisect(u, x), number=100)*10, 's')
print('find_first', timeit(lambda: find_first(u,x), number=100)*10, 's')
print('find_firstj', timeit(lambda: find_firstj(u,x), number=100)*10, 's')
print('find_index', timeit(lambda: find_index(u, x), number=100)*10, 's')
print('searchsort', timeit(lambda: searchsort(u, x), number=100)*10, 's')

# in the end
print('---- at the end ----')


 x = 9940
    
    print('bisect.bisect', timeit(lambda: bisect.bisect(u, x), number=100)*10, 's')
    print('find_first', timeit(lambda: find_first(u,x), number=100)*10, 's')
    print('find_firstj', timeit(lambda: find_firstj(u,x), number=100)*10, 's')
    print('find_index', timeit(lambda: find_index(u, x), number=100)*10, 's')
    print('searchsort', timeit(lambda: searchsort(u, x), number=100)*10, 's')

For 10K numpy array the result is:

    ---- at the beginning ----
bisect.bisect 0.009455299004912376 s
find_first 0.01455735880881548 s
find_firstj 0.023144721053540707 s
find_index 0.010857589077204466 s
searchsort 0.005023379344493151 s
---- in the middle ----
bisect.bisect 0.002835090272128582 s
find_first 1.3125479291193187 s
find_firstj 0.003973201382905245 s
find_index 0.01642579911276698 s
searchsort 0.007164010312408209 s
---- at the end ----
bisect.bisect 0.0030333898030221462 s
find_first 3.0728489509783685 s
find_firstj 0.006884939502924681 s
find_index 0.01768168993294239 s
searchsort 0.007541531231254339 s

For 1.5M array (x = 40/750040/1499400) the result is:

---- at the beginning ----
bisect.bisect 0.004126268904656172 s
find_first 0.009320289827883244 s
find_firstj 0.0005324906669557095 s
find_index 0.006837178952991962 s
searchsort 0.004959029611200094 s
---- in the middle ----
bisect.bisect 0.003964949864894152 s
find_first 166.71787671046332 s
find_firstj 0.5532677914015949 s
find_index 0.024038420524448156 s
searchsort 0.004650468472391367 s
---- at the end ----
bisect.bisect 0.0036938488483428955 s
find_first 329.8780040605925 s
find_firstj 1.570841120555997 s
find_index 0.026646379847079515 s
searchsort 0.00493861036375165 s

Please, copypaste to check for other sizes and options.

Upvotes: 0

User_Targaryen
User_Targaryen

Reputation: 4225

I have a simple python implementation based on the power of 2's approach as discussed in the comments section. Please have a look:

def binary_search(nums,low,high,x):
  while low<=high:
    mid = (low+high)/2
    if nums[mid]==x:
      return mid+1
    elif nums[mid]>x:
      high = mid-1
    else:
      low = mid+1
  return -1

def find_index(nums,x):
  i = 1
  l = len(nums)
  while i<l:
    if nums[i-1]==x:
      return i
    elif 2*i<l and nums[2*i-1]>x:
      return binary_search(nums,i-1,2*i-1,x)
    i = 2*i
  return binary_search(nums,i/2,l-1,x)

def main():
  line = raw_input("Enter numbers: ").split()
  nums = []
  for ele in line:
    nums.append(int(ele))

  nums = sorted(nums)
  print "Sorted array: ",nums
  x = int(raw_input("Enter the element to find in sorted array: "))
  print find_index(nums, x)

main()

Firstly, it tries to find the target element by moving over indexes with a power of 2.

At any point if the current element exceeds the target element, then we do a binary search between the current index(current power of 2) and the previous index(previous power of 2).

The time complexity of the searching process is logd on an average. Also the best case time complexity is logd as expected!!

Hope it is easy to understand!!!!

Upvotes: 1

Matt Timmermans
Matt Timmermans

Reputation: 59144

You can do it like this: It uses Θ(log d) steps to find a range of size Θ(d), and then does a binary search in that range in another Θ(log d) steps.

int search(int[] array, int length, int valueToFind)
{
    int pos=0;
    int limit=min(length,1);
    while(limit < length && array[limit] < valueToFind)
    {
        pos=limit+1;
        limit = min(length, limit*2+1);
    }
    while(pos<limit)
    {
        int testpos = pos+((limit-pos)>>1);

        if (array[testpos]<valueToFind)
            pos=testpos+1;
        else
            limit=testpos;
    }
    return (pos < length && array[pos]==valueToFind ? pos : -1);
}

Note that the binary search I use does not exit early, and always takes Θ(log (limit-pos)) time. Even so it's faster than other searches that do exit early, because it does only one comparison per iteration. I describe other advantages here:

How can I simplify this working Binary Search code in C?

Upvotes: 1

Related Questions