SXS
SXS

Reputation: 21

Dictionary of unknown size - find whether a word is in dictionary

Here is an interesting problem.

Given an interface to a dictionary. It is unknown size, distribution, and content. Sorted ascending.

Also we have just a one method

String getWord(long index) throws IndexOutOfBoundsException

Add one method to the API:

boolean isInDictionary(String word)

What would be the best implementation for this problem.

Upvotes: 0

Views: 711

Answers (4)

SXS
SXS

Reputation: 21

Here is my implementation

 boolean isWordInTheDictionary(String word){
    if (word == null){
        return false;
    }
    // estimate the length of the dictionary array
    long len=2;
    String temp= getWord(len);

    while(true){
        len = len * 2;
        try{
          temp = getWord(len);
        }catch(IndexOutOfBoundsException e){
           // found upped bound break from loop
           break;
        }
    }

    // Do a modified binary search using the estimated length
    long beg = 0 ;
    long end = len;
    String tempWrd;
    while(true){
        System.out.println(String.format("beg: %s, end=%s, (beg+end)/2=%s ", beg,end,(beg+end)/2));
        if(end - beg <= 1){
            return false;
        }
        long idx = (beg+end)/2;
        tempWrd = getWord(idx);
        if(tempWrd == null){
            end=idx;
            continue;
        }
        if ( word.compareTo(tempWrd) > 0){
            beg = idx;
        }
        else if(word.compareTo(tempWrd) < 0){
            end= idx;
        }else{
            // found the word..
            System.out.println(String.format("getword at index: %s, =%s", idx,getWord(idx)));
            return true;
        }
    }
}

Let me know if this is correct

Upvotes: 1

D. Cannone
D. Cannone

Reputation: 78

duedl0r is correct, you can't assume that the dictionary will be ordered.

not having any other information, probably random search is the best algorithm that you can choose (after having estimated the size or during the estimation)

just for correcteness, in the second part of your algorithm you should check for exceptions and handle them, because, as you had said in the comment, your estimate is only an upper bound and during getWord there is the possibility that you will catch one

edit: just to give a better explanation
search in an unsorted list has lower bound for time complexity equals to O(n)
randomized search has complexity equals to O(k), where k is the iterations in search. so, you can decide k. but it is important to understand that randomized search does not guarantee success
when n, size of the dictionary, is very big, you can set k to a number of some orders lower than n and having high probability to find the word

Upvotes: 0

duedl0r
duedl0r

Reputation: 9424

No, the words inside the dictionary probably aren't sorted. So you have to iterate through the dictionary and check every word if it is the one you're looking for.

If it is sorted, you're solution can be improved. The first loop only has to find out the right most entry after your word, you're searching.

Upvotes: 0

James Waldby - jwpat7
James Waldby - jwpat7

Reputation: 8711

Let's suppose that your hypothetical data structure, with its single method, String getWord(long index), is based on a Dictionary that implements the usual Dictionary operations:

  • addition of pairs to the collection
  • removal of pairs from the collection
  • modification of the values of existing pairs
  • lookup of the value associated with a particular key

but the methods for all but the last have been hidden from you.

If that is the case, then your code definitely is not correct, because there is no reason to suppose that the dictionary stores values in any particular order, hence your binary search for items, using word.compareTo(), cannot be expected to work.

Also, you don't have catch code for index numbers between the dictionary size and len, the power of two that you found to be larger than the dictionary size, which need not be a power of two, so even if you changed to linear search instead of binary, you'd have an unhandled exception for words not in dictionary.

Upvotes: 0

Related Questions