RyanW
RyanW

Reputation: 5408

Binary Search with an unknown number of items

Assuming you don't know the number of elements you are searching and given an API that accepts an index and will return null if you are outside the bounds (as implemented here with the getWordFromDictionary method), how can you perform a binary search and implement the isWordInDictionary() method for client programs?

This solution works, but I ended up doing a serial search above the level where I found an initial high-index value. The search through the lower range of values was inspired by this answer. I also peeked at BinarySearch in Reflector (C# decompiler), but that has a known list length, so still looking to fill in the gaps.

private static string[] dictionary;

static void Main(string[] args)
{
    dictionary = System.IO.File.ReadAllLines(@"C:\tmp\dictionary.txt");

    Console.WriteLine(isWordInDictionary("aardvark", 0));
    Console.WriteLine(isWordInDictionary("bee", 0));
    Console.WriteLine(isWordInDictionary("zebra", 0));
    Console.WriteLine(isWordInDictionaryBinary("aardvark"));
    Console.WriteLine(isWordInDictionaryBinary("bee"));
    Console.WriteLine(isWordInDictionaryBinary("zebra"));
    Console.ReadLine();
}

static bool isWordInDictionaryBinary(string word)
{
    // assume the size of the dictionary is unknown

    // quick check for empty dictionary
    string w = getWordFromDictionary(0);
    if (w == null)
        return false;

    // assume that the length is very big.
    int low = 0;
    int hi = int.MaxValue;

    while (low <= hi)
    {
        int mid = (low + ((hi - low) >> 1));
        w = getWordFromDictionary(mid);

        // If the middle element m you select at each step is outside 
        // the array bounds (you need a way to tell this), then limit
        // the search to those elements with indexes small than m.
        if (w == null)
        {
            hi = mid;
            continue;
        }

        int compare = String.Compare(w, word);
        if (compare == 0)
            return true;

        if (compare < 0)
            low = mid + 1;
        else
            hi = mid - 1;
    }

    // punting on the search above the current value of hi 
    // to the (still unknown) upper limit
    return isWordInDictionary(word, hi);
}


// serial search, works good for small number of items
static bool isWordInDictionary(string word, int startIndex) 
{
    // assume the size of the dictionary is unknown
    int i = startIndex;
    while (getWordFromDictionary(i) != null)
    {
        if (getWordFromDictionary(i).Equals(word, StringComparison.OrdinalIgnoreCase))
            return true;
        i++;
    }

    return false;
}

private static string getWordFromDictionary(int index)
{
    try
    {
        return dictionary[index];
    }
    catch (IndexOutOfRangeException)
    {
        return null;
    }
}

Final Code after answers

static bool isWordInDictionaryBinary(string word)
{
    // assume the size of the dictionary is unknown

    // quick check for empty dictionary
    string w = getWordFromDictionary(0);
    if (w == null)
        return false;

    // assume that the number of elements is very big
    int low = 0;
    int hi = int.MaxValue;

    while (low <= hi)
    {
        int mid = (low + ((hi - low) >> 1));
        w = getWordFromDictionary(mid);

        // treat null the same as finding a string that comes 
        // after the string you are looking for
        if (w == null)
        {
            hi = mid - 1;
            continue;
        }

        int compare = String.Compare(w, word);
        if (compare == 0)
            return true;

        if (compare < 0)
            low = mid + 1;
        else
            hi = mid - 1;
    }

    return false;
}

Upvotes: 1

Views: 3066

Answers (3)

user684934
user684934

Reputation:

Sure you can. Start at index one, and double your query index until you hit something that's lexographically larger than your query word(Edit: or null). Then you can narrow down your search space again until you find the index, or return false.

Edit: Note that this does NOT add to your asymptotic runtime, and it is still O(logN), where N is the number of items in the series.

Upvotes: 2

Pablo
Pablo

Reputation: 8644

You can implement a binary search in two phases. In the first phase, you grow the size of the interval you're searching in. Once you detect you're outside the bounds, you can do a normal binary search in the latest interval you found. Something like this:

bool isPresentPhase1(string word)
{
  int l = 0, d = 1;
  while( true ) // you should eventually reach an index out of bounds
  {
    w = getWord(l + d);
    if( w == null )
      return isPresentPhase2(word, l, l + d - 1);
    int c = String.Compare(w, word);
    if( c == 0 )
      return true;
    else if( c < 0 )
      isPresentPhase2(value, l, l + d - 1);
    else
    {
      l = d + 1;
      d *= 2;
    } 
  }
}

bool isPresentPhase2(string word, int lo, int hi)
{
    // normal binary search in the interval [lo, hi]
}

Upvotes: 4

Sasha Aickin
Sasha Aickin

Reputation: 44

So, I'm not sure I entirely understand the problem from your description, but I'm assuming you're trying to search through a sorted array of unknown length to find a particular string. I'm also assuming that there are no nulls in the actual array; the array only returns null if you ask for an index that's out of bounds.

If those things are true, the solution should be just a standard binary search, albeit one where you search over the entire integer space, and you just treat null the same as finding a string that comes after the string you are looking for. Essentially just imagine that your sorted array of N strings is really a sorted array of INT_MAX strings sorted with nulls at the end.

What I don't quite understand is that you seem to basically have done that already (at least from a cursory look at the code), so I think I might not understand your problem completely.

Upvotes: 0

Related Questions