Reputation: 5408
Assuming you don't know the number of elements you are searching and given an API that accepts an index and will return null if you are outside the bounds (as implemented here with the getWordFromDictionary method), how can you perform a binary search and implement the isWordInDictionary() method for client programs?
This solution works, but I ended up doing a serial search above the level where I found an initial high-index value. The search through the lower range of values was inspired by this answer. I also peeked at BinarySearch in Reflector (C# decompiler), but that has a known list length, so still looking to fill in the gaps.
private static string[] dictionary;
static void Main(string[] args)
{
dictionary = System.IO.File.ReadAllLines(@"C:\tmp\dictionary.txt");
Console.WriteLine(isWordInDictionary("aardvark", 0));
Console.WriteLine(isWordInDictionary("bee", 0));
Console.WriteLine(isWordInDictionary("zebra", 0));
Console.WriteLine(isWordInDictionaryBinary("aardvark"));
Console.WriteLine(isWordInDictionaryBinary("bee"));
Console.WriteLine(isWordInDictionaryBinary("zebra"));
Console.ReadLine();
}
static bool isWordInDictionaryBinary(string word)
{
// assume the size of the dictionary is unknown
// quick check for empty dictionary
string w = getWordFromDictionary(0);
if (w == null)
return false;
// assume that the length is very big.
int low = 0;
int hi = int.MaxValue;
while (low <= hi)
{
int mid = (low + ((hi - low) >> 1));
w = getWordFromDictionary(mid);
// If the middle element m you select at each step is outside
// the array bounds (you need a way to tell this), then limit
// the search to those elements with indexes small than m.
if (w == null)
{
hi = mid;
continue;
}
int compare = String.Compare(w, word);
if (compare == 0)
return true;
if (compare < 0)
low = mid + 1;
else
hi = mid - 1;
}
// punting on the search above the current value of hi
// to the (still unknown) upper limit
return isWordInDictionary(word, hi);
}
// serial search, works good for small number of items
static bool isWordInDictionary(string word, int startIndex)
{
// assume the size of the dictionary is unknown
int i = startIndex;
while (getWordFromDictionary(i) != null)
{
if (getWordFromDictionary(i).Equals(word, StringComparison.OrdinalIgnoreCase))
return true;
i++;
}
return false;
}
private static string getWordFromDictionary(int index)
{
try
{
return dictionary[index];
}
catch (IndexOutOfRangeException)
{
return null;
}
}
Final Code after answers
static bool isWordInDictionaryBinary(string word)
{
// assume the size of the dictionary is unknown
// quick check for empty dictionary
string w = getWordFromDictionary(0);
if (w == null)
return false;
// assume that the number of elements is very big
int low = 0;
int hi = int.MaxValue;
while (low <= hi)
{
int mid = (low + ((hi - low) >> 1));
w = getWordFromDictionary(mid);
// treat null the same as finding a string that comes
// after the string you are looking for
if (w == null)
{
hi = mid - 1;
continue;
}
int compare = String.Compare(w, word);
if (compare == 0)
return true;
if (compare < 0)
low = mid + 1;
else
hi = mid - 1;
}
return false;
}
Upvotes: 1
Views: 3066
Reputation:
Sure you can. Start at index one, and double your query index until you hit something that's lexographically larger than your query word(Edit: or null). Then you can narrow down your search space again until you find the index, or return false.
Edit: Note that this does NOT add to your asymptotic runtime, and it is still O(logN), where N is the number of items in the series.
Upvotes: 2
Reputation: 8644
You can implement a binary search in two phases. In the first phase, you grow the size of the interval you're searching in. Once you detect you're outside the bounds, you can do a normal binary search in the latest interval you found. Something like this:
bool isPresentPhase1(string word)
{
int l = 0, d = 1;
while( true ) // you should eventually reach an index out of bounds
{
w = getWord(l + d);
if( w == null )
return isPresentPhase2(word, l, l + d - 1);
int c = String.Compare(w, word);
if( c == 0 )
return true;
else if( c < 0 )
isPresentPhase2(value, l, l + d - 1);
else
{
l = d + 1;
d *= 2;
}
}
}
bool isPresentPhase2(string word, int lo, int hi)
{
// normal binary search in the interval [lo, hi]
}
Upvotes: 4
Reputation: 44
So, I'm not sure I entirely understand the problem from your description, but I'm assuming you're trying to search through a sorted array of unknown length to find a particular string. I'm also assuming that there are no nulls in the actual array; the array only returns null if you ask for an index that's out of bounds.
If those things are true, the solution should be just a standard binary search, albeit one where you search over the entire integer space, and you just treat null the same as finding a string that comes after the string you are looking for. Essentially just imagine that your sorted array of N strings is really a sorted array of INT_MAX strings sorted with nulls at the end.
What I don't quite understand is that you seem to basically have done that already (at least from a cursory look at the code), so I think I might not understand your problem completely.
Upvotes: 0