Mofe Ejegi
Mofe Ejegi

Reputation: 1073

Efficient algorithm for word creation in java

I'm developing a word game using the libGDX framework, and I want to implement a basic Hint feature whereby the system can generate a valid 7 letter word from the 17 letters provided on a board. Now, the setup of the 17 letters (to pick from) on board is completely random and so I cannot use pre-determined hint words for hints (like in 4 pics 1 word).

My solution was to do a little bit of combination by finding every possible 7 letter combination from the 17 letters provided. Next up, I permuted each of the combinations and cross checked them with a wordSet English Lexicon and the first valid word I obtain would be the hint word.

As you've already guessed, this process is tasking knowing that 17 Permutation 7 is already equals to 98 million possible permutations. To cover up for this, I used a FixedThreadPool to split the permutation tasks (into about 20 worker threads), and I now have a relatively fast word search, but the downside is, this causes serious lag in lower end devices.

Any suggestions on how to make a better hint word search algorithm or improve my method explained above would be appreciated.

Upvotes: 1

Views: 353

Answers (3)

samgak
samgak

Reputation: 24417

Here's a couple of ways to optimize it:

  • Instead of generating millions of combinations and testing them, iterate over the list of English words and test those against your 17 letters. Instead of doing millions of tests you'll be doing thousands.
  • Test if a word can be formed from your 17 letters using letter counts, instead of actually generating strings of letters in a particular order and comparing against words. This means you don't have to generate each possible ordering of letters.

To achieve the second optimization it's necessary to store the words in your word list in a way that you can easily compare letter counts. A simple way to do that is to store each word along with another string made up of all the letters sorted into alphabetical order. For example you'd store "cheese" and "ceeehs" as a pair. You can count the runs of letters and see if you have enough of each.

At the start of your process you can count how many of each letter you have in your 17 and store them in an array. Then go through your word list and test each word, something like this:

int[] letter_counts = new int[26]; // how many of each letter of the
                                   // alphabet you have in your 17 letters

boolean test_word(String word)  // pass in the sorted string e.g "ceeehs"
{
    char prev = '\0'; // use this for detecting repeating letters
    int count = 0;
    for(int i = 0; i < word.length(); i++)
    {
        char c = word.charAt(i);
        if(c != prev)
        {
            prev = c;
            count = 0;
        }
        count++;
        if(letter_counts[(int)c - 'a'] < count)
            return false; // not enough letters
    }
    return true;
}

To avoid a bias towards words earlier in the word list (since you stop as soon as you find one), you can start at a random place in the word list, and wrap around to the start.

This approach is by no means optimal, but will probably be fast enough and is simple to implement.

Upvotes: 1

user5063151
user5063151

Reputation:

Build a trie / prefix tree using your word list:

enter image description here

You can pick a random letter from the 17 available letters, and start traversing using only letters from the 17. The first node that has a flag saying this marks the end of a word can be a suggestion.

Lot's of predictive text programs use structures like this to guess the word you are typing or help you with a typo if you are not too far off in the graph from a node that marks the end of the word. It is better on space, in that the depth of the tree is determined by your longest word. But, you do not waste space on situations like:

do
dog
doggy
done
dunce

You only store:

                d
               / \
             *o   u
             / \   \
           *g   n   n
           /     \   \
          g      *e   c
         /             \
       *y              *e

Where * is a boolean flag that indicates where a word ends.

Time complexity:

Finding if a string of text is a word would by O(n) where n is the length of the word.

This site seems to have some optimization: https://www.geeksforgeeks.org/trie-insert-and-search/

Space, depends on if you implement with arrays, pointers, or hashmaps.

Upvotes: 2

Martijn
Martijn

Reputation: 417

I can't really test anything right now with regards to speed, but if you keep your lexicon sorted you could use a search algorithm on it: you first find all positions where the list of words which start with one of your 17 available letters. Then you look through each of these lists for those starting with one of the 16 other letters, and continue this until any word with an allowable length shows up.

As a simple example, consider the following:

String[] lexicon = {"parrot", "postal", "office", "spam"};
char[] letters = {'p', 'o', 's', 't', 'a', 'l'};

Which will yield the intermediate possibilities:

{"parrot", "postal", "office", "spam"}  (1 letter  matched)
{"parrot", "postal", "spam"}            (2 letters matched)
{"postal", "spam"}                      (3 letters matched)
{"parrot"}                              (4 letters matched)

At this point you can either continue your search algorithm or note that there is only one option left and use a different test to see if it is allowed.

This algorithm would require you to use C(17, 7) = 19,448 searches at most. You can also probably improve upon the costs of using a complete binary search for every single letter too.

Upvotes: 0

Related Questions