Reputation: 1073
I'm developing a word game using the libGDX framework, and I want to implement a basic Hint feature whereby the system can generate a valid 7 letter word from the 17 letters provided on a board. Now, the setup of the 17 letters (to pick from) on board is completely random and so I cannot use pre-determined hint words for hints (like in 4 pics 1 word).
My solution was to do a little bit of combination by finding every possible 7 letter combination from the 17 letters provided. Next up, I permuted each of the combinations and cross checked them with a wordSet English Lexicon and the first valid word I obtain would be the hint word.
As you've already guessed, this process is tasking knowing that 17 Permutation 7 is already equals to 98 million possible permutations. To cover up for this, I used a FixedThreadPool
to split the permutation tasks (into about 20 worker threads), and I now have a relatively fast word search, but the downside is, this causes serious lag in lower end devices.
Any suggestions on how to make a better hint word search algorithm or improve my method explained above would be appreciated.
Upvotes: 1
Views: 353
Reputation: 24417
Here's a couple of ways to optimize it:
To achieve the second optimization it's necessary to store the words in your word list in a way that you can easily compare letter counts. A simple way to do that is to store each word along with another string made up of all the letters sorted into alphabetical order. For example you'd store "cheese" and "ceeehs" as a pair. You can count the runs of letters and see if you have enough of each.
At the start of your process you can count how many of each letter you have in your 17 and store them in an array. Then go through your word list and test each word, something like this:
int[] letter_counts = new int[26]; // how many of each letter of the
// alphabet you have in your 17 letters
boolean test_word(String word) // pass in the sorted string e.g "ceeehs"
{
char prev = '\0'; // use this for detecting repeating letters
int count = 0;
for(int i = 0; i < word.length(); i++)
{
char c = word.charAt(i);
if(c != prev)
{
prev = c;
count = 0;
}
count++;
if(letter_counts[(int)c - 'a'] < count)
return false; // not enough letters
}
return true;
}
To avoid a bias towards words earlier in the word list (since you stop as soon as you find one), you can start at a random place in the word list, and wrap around to the start.
This approach is by no means optimal, but will probably be fast enough and is simple to implement.
Upvotes: 1
Reputation:
Build a trie / prefix tree using your word list:
You can pick a random letter from the 17 available letters, and start traversing using only letters from the 17. The first node that has a flag saying this marks the end of a word can be a suggestion.
Lot's of predictive text programs use structures like this to guess the word you are typing or help you with a typo if you are not too far off in the graph from a node that marks the end of the word. It is better on space, in that the depth of the tree is determined by your longest word. But, you do not waste space on situations like:
do
dog
doggy
done
dunce
You only store:
d
/ \
*o u
/ \ \
*g n n
/ \ \
g *e c
/ \
*y *e
Where *
is a boolean flag that indicates where a word ends.
Time complexity:
Finding if a string of text is a word would by O(n)
where n
is the length of the word.
This site seems to have some optimization: https://www.geeksforgeeks.org/trie-insert-and-search/
Space, depends on if you implement with arrays, pointers, or hashmaps.
Upvotes: 2
Reputation: 417
I can't really test anything right now with regards to speed, but if you keep your lexicon sorted you could use a search algorithm on it: you first find all positions where the list of words which start with one of your 17 available letters. Then you look through each of these lists for those starting with one of the 16 other letters, and continue this until any word with an allowable length shows up.
As a simple example, consider the following:
String[] lexicon = {"parrot", "postal", "office", "spam"};
char[] letters = {'p', 'o', 's', 't', 'a', 'l'};
Which will yield the intermediate possibilities:
{"parrot", "postal", "office", "spam"} (1 letter matched)
{"parrot", "postal", "spam"} (2 letters matched)
{"postal", "spam"} (3 letters matched)
{"parrot"} (4 letters matched)
At this point you can either continue your search algorithm or note that there is only one option left and use a different test to see if it is allowed.
This algorithm would require you to use C(17, 7) = 19,448
searches at most. You can also probably improve upon the costs of using a complete binary search for every single letter too.
Upvotes: 0