Maverik
Maverik

Reputation: 2398

Find out if a word can be the start of a word in a dictionary

I have to find out if a given word can be the start of other words in a dictionary.

I implemented the dictionary by using TreeSet.

TreeSet dictionary String startString;

Problem 1

What's the most efficient way to find out if startString is the start o at a least a word in the dictionary?

Idea 1

My idea is to use dictionary.subSet(startString, startStringPlusOne);

Where startStringPlusOne is equals to startString except for the last character, which is the following one in the alphabet.

Example:

startString: hom
startStringPlusOne: hon

In this way it SubSet returns an empty set this means that string is not the start of a word in the dictionary.

Problem 2

What's the most efficient way for computing stringPlusOne?

Idea 2

I thought to use an array of chars with the alphabet letters and replace the last letter of string with the following char in the array. Is there a more efficient way?

Upvotes: 4

Views: 240

Answers (1)

izb
izb

Reputation: 51820

If memory is not a concern, I'd be tempted to store two dictionaries. Into one you put your words, into another you put your word beginnings.

1)
["aardvark", "banana", "band"]

2)
{
    "aardvar" => 1,
    "aardva" => 1,
    "aardv" => 1,
    "aard" => 1, 
    "aar" => 1, 
    "aa" => 1, 
    "a" => 1, 
    "banan" => 1, 
    "bana" => 1, 
    "ban" => 2, 
    "ba" => 2, 
    "b" => 2
}

So the answer to the question "Are there any words beginning with 'ban'?" is "Yes, there are 2". Your question doesn't say if it would then be necessary to find out which words those are.

The count is only really useful if you are ever required to remove words from the dictionary. If so, you will need to decrement the count and remove keys when they reach 0. If you don't need to do this, then you don't need to store the number.

If you need to answer the question "Which words start with 'ban'?", then you'll need to store references to those words instead of just a count, e.g.

"ban" => ["banana", "band"]

This would seem to be the most efficient in terms of speed, at the expense of efficiency in terms of memory (Which may not be a concern worth worrying about).

Upvotes: 1

Related Questions