viky
viky

Reputation: 91

How to make binary search of a stored array to be stable

Here is the code for the binary search for an element in a sorted array:

#include<stdio.h>
int binarySearch(int *arr, int l, int r, int data)
{
    if(l > r)
        return -1;

    int mid = l+(r-l)/2;    //find the middle index 

    if(data < arr[mid]) {
        return(binarySearch(arr, l, mid-1, data));
    }
    else if(data > arr[mid]) {
        return(binarySearch(arr, mid+1, r, data));
    }
    else {
        return mid;
    }        
}

int main()
{
    int arr [] = {0 , 11, 22, 33, 44, 55, 66 };
    int n = sizeof(arr)/sizeof(arr[0]);     
    int data = 22;
    int index = binarySearch(arr, 0, n-1, data);
    if( index != -1) 
    {
          printf("%d" , index);
    }
    return 0;          
}

How can I make the search stable? When the elements of the array are repeated, then my search should return the index of the first occurrence of the data in the array.

I want my modified code to produce as output:

input array is {1, 22, 22, 22}
output = 1, 
input array is {1, 12, 15, 22, 22, 22, 22, 22, 22, 22, 55 ,66}
output = 3

I can't see how to do that.

Upvotes: 2

Views: 649

Answers (5)

davmac
davmac

Reputation: 20641

You would change the condition for a match from arr[mid] == data to the more complicated arr[mid] == data && (mid == 0 || arr[mid-1] != data). Change:

    else {
        return mid;
    }        

to:

    else if (mid == 0 || arr[mid-1] != data) {
        // note that arr[mid] == data is implied at this point
        return mid;
    }
    else {
        return(binarySearch(arr, l, mid, data));
    }

This still gives you O(log(n)) performance in case there is a large run of the searched-for value within the array (compared to some other simpler solutions, which degrade to O(n) performance in that case). You also keep the O(1) best case from the original search: that is, the result is potentially found without any recursion taking place.

Note that it does assume that it is ok to access the array outside the lower (l) bound iff that bound is not 0, whereas the original code does not make such an assumption. In the example you post, this is not a problem. If it was a problem, you could either pass the original bound down (as, say ol, and then mid == 0 in the above becomes mid == ol), or instead use:

else if (mid == l) {
    return mid;
}
else {
    return(binarySearch(arr, l, mid - 1, data));
}

The latter loses you the O(1) best case, however.

Upvotes: 3

Jarod42
Jarod42

Reputation: 217870

With <algorithm>, you may do

int binarySearch(const int *arr, int l, int r, int data)
{
    // inclusive `r` for binarySearch
    auto it = std::lower_bound(arr + l, arr + r + 1, data);

    if (it == arr + r + 1 || *it != data) {
        return -1;
    }
    return std::distance(arr + l, it);
}

Demo

Upvotes: 0

MrSmith42
MrSmith42

Reputation: 10151

Depending on how many equal elements you expect here a re two approaches:

  1. simply go backwards in the list starting from the found element until you reached the first of the equal elements (takes O(n) n = number of equal elements)

  2. search again in the subarray starting with index 0 and ending with the index of the found element. Do this until the new sound element has the same index as the one found before.

Here an illustration of version 2 (let each character be an element) and be seek for B

AAAABBBBBBBBBBBBBBBBBBBBBBCDDDDEEEFFFZ
^                                    ^  search range

AAAABBBBBBBBBBBBBBBBBBBBBBCDDDDEEEFFFZ
^                 !                  ^  found at position !

AAAABBBBBBBBBBBBBBBBBBBBBBCDDDDEEEFFFZ
^                 ^  new search range

AAAABBBBBBBBBBBBBBBBBBBBBBCDDDDEEEFFFZ
^        !        ^  found at position ! 
(different from previous finding position)

AAAABBBBBBBBBBBBBBBBBBBBBBCDDDDEEEFFFZ
^        ^  new search range

AAAABBBBBBBBBBBBBBBBBBBBBBCDDDDEEEFFFZ
^    !   ^   found at position ! 
(different from previous finding position)

AAAABBBBBBBBBBBBBBBBBBBBBBCDDDDEEEFFFZ
^    ^  new search range

AAAABBBBBBBBBBBBBBBBBBBBBBCDDDDEEEFFFZ
^   !^   found at position ! 
(different from previous finding position)

AAAABBBBBBBBBBBBBBBBBBBBBBCDDDDEEEFFFZ
^   ^  new search range

AAAABBBBBBBBBBBBBBBBBBBBBBCDDDDEEEFFFZ
^   !  found at same position as before => lirst one

Upvotes: 1

V. Kravchenko
V. Kravchenko

Reputation: 1904

Here I've changed your code so it checks every element to the left of found if it is equal to searched element, too.

    if(data < arr[mid]) {
        return(binarySearch(arr, l, mid-1, data));
    }
    else if(data > arr[mid]) {
        return(binarySearch(arr, mid+1, r, data));
    }
    else {
        while(mid && data == arr[--mid]);
        return mid + 1;
    }      

But it can be slow if your entire array, for example, consists of same elements. Other solution is continuing search, but you need to keep in mind, that found element is valid and may be the only valid one, so you should never lose it on next recursive call (use mid instead of mid - 1 or mid + 1). Here is the code (sorry for changing the formatting).

   if (data == arr[mid]) {
        if (r - l == 0) {
            return mid;
        }
        return binarySearch(arr, l, mid, data);
    }
    if(data < arr[mid])
        return binarySearch(arr, l, mid-1, data);
    return binarySearch(arr, mid+1, r, data);

Upvotes: 0

GMichael
GMichael

Reputation: 2776

Consider replacing return mid; inside int binarySearch(int *arr, int l, int r, int data) with the following:

for(; (mid > 0) && (data == arr[mid]); mid--);
return (data == arr[mid]) ? mid : mid + 1;

Upvotes: 0

Related Questions