Danny Milosavljevic
Danny Milosavljevic

Reputation: 642

How to find the insertion point in an array bsearch() works on?

using bsearch() in C (standard library) one can quickly find an entry in a sorted array.

However, how do I calculate where to insert a new entry (using the standard library)?

bsearch() specifically checks whether the found item's key is equal to the passed key and if it isn't, it returns NULL - so can't use that.

Upvotes: 5

Views: 2520

Answers (5)

user215626
user215626

Reputation: 1

Further to @Leo's excellent answer: there is one final wrinkle which you'll run into if you try to make this work. I just did:-)

You need to (as @phoxis suggested) also store the result of the last comparison, because if the result of the last comparison between the new key and the array element that was last compared was > 0, the insertion point is one element further on than the array element that was last compared.

So to fix this, add "int last_cmp" to the bsearch_insertion_state structure, sets it inside the comparison function:

int cmp = state->compar(state->key, b);
state->last_cmp = cmp;
return cmp;

and then (this may not be the most elegant method) alter the return type of bsearch_insertion() from void * to bsearch_insertion_state (i.e. return the whole structure), so that the caller may then write, in the test program:

bsearch_insertion_state state = bsearch_insertion(
    &key, data, sizeof(data) / sizeof(data[0]), sizeof(data[0]), cmp);
int *ipptr = state.last_visited;
if( state.last_cmp > 0 ) ipptr++;
int ip = ipptr - data;

// Should print "Insertion point: 4"
printf("Insertion point (insert key at position): %d\n", ip );

[Note: there may be a subtle difference in my definition of the insertion point compared to @Leo's, hence @Leo's version printed 3, mine prints 4 - but according to my understanding, the insertion point is the position at which the new key is inserted, i.e. first you shuffle up elements ip..endofarray up one, and then you store the key via data[ip] = key. For this particular test case, 4 is the correct insertion point.]

I have built my own version of this, with an array of char * strings, and strcmp() for the low-level comparisons, so my "ipptr" was a char **, and written the outer shuffle up and insert code, and I can verify that this appears to work for me, producing correctly sorted strings.

Upvotes: 0

Leo
Leo

Reputation: 1343

Here's an improvement upon @phoxis's answer that will make the code thread-safe and reentrant by avoiding any global variables. The trick is to use the key itself to store the last visited address.

bsearch_insertion.h

#include <stdlib.h>

#ifndef BSEARCH_INSERTION_H
#define BSEARCH_INSERTION_H

/* Just like bsearch(3), but return a pointer to the element after which
 * the key would need to be inserted in order to maintain sorted order. */
void *bsearch_insertion(
    const void *key, const void *base, size_t nel,
    size_t width, int (*compar)(const void *, const void *));

#endif /* BSEARCH_INSERTION_H */

bsearch_insertion.c

#include "bsearch_insertion.h"

typedef struct
{
    const void *key;
    int (*const compar)(const void *, const void *);
    void *last_visited;
} bsearch_insertion_state;

static int bsearch_insertion_compare(const void *a, const void *b)
{
    bsearch_insertion_state *state = (bsearch_insertion_state *) a;
    state->last_visited = (void *) b;
    return state->compar(state->key, b);
}

void *bsearch_insertion(
    const void *key, const void *base, size_t nel,
    size_t width, int (*compar)(const void *, const void *))
{
    bsearch_insertion_state state = {key, compar, NULL};
    bsearch(&state, base, nel, width, bsearch_insertion_compare);
    return state.last_visited;
}

Example: test.c

#include <stdio.h>
#include "bsearch_insertion.h"

static int cmp(const void *a, const void *b)
{
    int aint = *(const int *)a;
    int bint = *(const int *)b;
    return aint - bint;
}

int main(int argc, char **argv)
{
    int data[] = {0, 1, 2, 3, 5};
    int key = 4;
    void *result = bsearch_insertion(
        &key, data, sizeof(data) / sizeof(data[0]), sizeof(data[0]), cmp);
    /* Should print "Insertion point: 3" */
    printf("Insertion point: %d\n", (int *)result - data);
    return 0;
}

Upvotes: 5

phoxis
phoxis

Reputation: 61920

Its not clear from the question but possibly this is what you want:

You can do something like this to find the index in the array where bsearch () found the match.

if (bsearch_returned_address != NULL)
  index = (bsearch_returned_address - array_base_address)

EDIT

To know the location which the bsort last visited, check the below stuff out.

The good thing is that the manual says:

The compar routine is expected to have two arguments which point to the key object and to an array member, in that order, and should return an integer less than, equal to, or greater than zero if the key object is found, respectively, to be less than, to match, or be greater than the array member.

Therefore you can store the second argument inside the comparison function in a global variable, and in a case of the fail use the address in this variable, which points to the last location the bsearch function visited to find for a match.

For example:

A list with address and value:

[0x8d6010: 0][0x8d6014: 4][0x8d6018: 8][0x8d601c: 12][0x8d6020: 16][0x8d6024: 20][0x8d6028: 24][0x8d602c: 28][0x8d6030: 32][0x8d6034: 36]

Value to search

13

output

fmem: (nil)  //this is the memory location where it was found
last_mem1: 0x7fff8c8e6c54 //last val of 1st param of compare
last_mem2: 0x8d601c //last val of 2nd param of compare
*last_mem1: 13,        *last_mem2: 12

The sample compare function code is

static const int *last_mem1, *last_mem2;

static int
compmi(const void *a, const void *b)
{
    last_mem1 = a; last_mem2 = b;
    return *(int *)a - *(int *)b;
}

So you can insert after the address in last_mem2. Although there are terminal cases, if you find a key which is less than the first element, then last_mem2 will also have the address of the first element.

But how ever you have to shift the array elements to make place for the insertion, which will make the insertion complexity to O(n). You might want to improve performance by introducing some kind of lazy insertion, like make a separate unordered list, which is much smaller than your original list, and dump new elements there. When searching, perform bsearch in original list, and linear search in the dump. When the dump list grows past a certain threshold, you can merge the list by performing an insertion sort. But still, you can't be O(lg n).

Upvotes: 5

SKi
SKi

Reputation: 8466

Because insert causes copying of tail of array, time is O(n). So simple linear search won't slow down your code dramatically. You can even copy items during searching, if you start searching from the end of array.

Upvotes: 1

unwind
unwind

Reputation: 399871

Not sure what you mean by "calculate insertion place"; you build an array, then sort it using qsort(), then do (many) searches using bsearch().

In other words: for typical usage, you don't need to implement the array-sorting, since the standard library contains functionality for that, too.

Not sure about the connection to bisecting, here.

UPDATE: From the comment, it seems you're concerned about doing inserts into an array that you're also doing searches from. I would recommend looking at some other data structure that is more friendly towards inserts, such as a hash table for instance. By not relying on sorting to keep searches fast, a hash table might perform better. Inserting into an array involves moving all the subsequent elements, which is also quite costly and which is not needed for e.g. a hash table.

UPDATE 2: To actually try to answer your question, assuming you have a bsearch()-comparible comparator() function for your array of n entries, the index for the new item ni should be given by something like this:

size_t i;

for( i = 0; i < n && comparator(&ni, array + i) >= 0; ++i )
  ;
/* Grow array, copy i..n to (i+1)..(n+1), insert ni at i. */

Upvotes: 3

Related Questions