Eloff
Eloff

Reputation: 21666

Why does my binary search need an extra comparison? log2(N)+1

I want to find the index of the first integer in an array of integers which is <= key. I can do it with binary search in log2(N)+1 compares. Shouldn't it be possible with only log2(N) compares?

// Returns the index of the first integer in keys <= key. size must be a power of 2.
unsigned find(int key, int* keys, unsigned size) {    
    int* origKeys = keys;

    for (int i = 0; i < log2(size); i++) {
        size /= 2;

        if (keys[size] < key)
            keys += size;
    }

    unsigned i = unsigned(keys - origKeys);
    // Not only is this step messy, because it doesn't fit the loop, but
    // now we have log2(n) + 1 comparisons.
    if (*keys < key)
        i++;

    return i;
}

Upvotes: 5

Views: 1570

Answers (2)

bc1121
bc1121

Reputation: 579

Yes, using a binary search and assuming element is in the set, log2(N) compares will suffice.

However, if your element is not in the set, it could take you log2(N) + 1, compares to realize that you have exhausted the set. The final comparison is to an empty set which let's you realize you are done.

Personally, I wondered if the comparison to an empty set counts as a comparison, but the text I'm reading counts it as a step the computer would take increasing the runtime by one step.

Upvotes: 0

templatetypedef
templatetypedef

Reputation: 372784

Let's think about this from an information-theoretic perspective. If you have an array with n elements in it, there are n+1 possible spots where the new element can go: just before any element of the array, or after all the elements. Therefore, your algorithm needs to make enough comparisons to be able to uniquely identify which of the n+1 spots is the right one. If you don't make enough comparisons, the answer you give back won't always be right.

Every comparison you make can, in the best case, eliminate half of the possible positions. Therefore, in the theoretical limit, with k comparisons you can decide which of 2^k positions is right. Since there are n+1 possible positions, you need to make lg (n+1) comparisons in the worst case, rather than lg n. Since your n is a perfect power of two, this means that one extra comparison is required. On the other hand, if you have n be one less than a perfect power of two, then making ceil(lg n) comparisons should suffice.

Edited by Eloff, this code seems to give the right answer with log2(n+1) steps as you predicted:

// Returns the index of the first integer in keys <= key. size must be one less than a power of 2.
unsigned find(int key, int* keys, unsigned size) {    
    int* origKeys = keys;
    size++;

    while(size > 1) {
        size /= 2;

        if (keys[size-1] < key)
            keys += size;
    }

    return unsigned(keys - origKeys);        
}

Upvotes: 7

Related Questions