swagrov
swagrov

Reputation: 1460

Where to start: what set of N letters makes the most words?

I'm having trouble coming up with a non-brute force approach to solve this problem I've been wondering: what set of N letters can be used to make the most words from a given dictionary? Letters can be used any number of times.

For example, for N=3, we can have EST to give words like TEST and SEE, etc...

Searching online, I found some answers (such as listed above for EST), but no description of the approach.

My question is: what well-known problems are similar to this, or what principles should I use to tackle this problem?

NOTE: I know it's not necessarily true that if EST is the best for N=3, then ESTx is the best for N=4. That is to say, you can't just append a letter to the previous solution.

In case you're wondering, this question came to mind because I was wondering what set of 4 ingredients could make the most cocktails, and I started searching for that. Then I realized my question was specific, and so I figured this letter question is the same type of problem, and started searching for it as well.

Upvotes: 4

Views: 323

Answers (2)

swagrov
swagrov

Reputation: 1460

So I coded up a solution to this problem that uses a hash table to get things done. I had to deal with a few problems along the way too!

  1. Let N be the size of the group of letters you are looking for that can make the most words. Let L be the length of the dictionary.
  2. Convert each word in the dictionary into a set of letters: 'test' -> {'e','s','t'}
  3. For each number 1 to N inclusive, create a cut list that contains the words you can make with exactly that many letters.
  4. Make a hash table for each number 1 to N inclusive, then go through the corresponding cut list and use the set as a key, and increment by 1 for each member of the cut list.
  5. This was the part that gave me trouble! Create a set out of your cut list (unique_cut_list) for N. This is essentially all the populated key-value pairs for the hash table for N.
  6. For each set in unique_cut_list, generate all subsets, and check the corresponding hash table (the size of the subset) to see if there is a value. If there is, add that value to the hash table for N with the key of the original set.
  7. Finally, go through the hash table and find the max value. The corresponding key is the group of letters you're after.

You go through the dictionary 1+2N times for steps 1-5, step 6 goes through a version of the dictionary and check (2^N)-1 subsets each time (ignore null set). That gives O(2NL + L*2^N) which should approach O(L*2^N). Not bad, since N will not be too big in most applications!

Upvotes: 0

user31264
user31264

Reputation: 6737

For each word in dictionary, sort it letters and remove duplicates. Let it be the skeleton of the word. For each skeleton, count how many words contain it. Let it be its frequency. Ignore all skeletons whose size is higher than N.

Let a subskeleton be any possible removals of 1 or more letters from the skeleton, i.e. EST has subskeletons of E,S,T,ES,ET,ST. For each skeleton of size N, add the count of this skeleton and all its subskeletons. Select the skeleton with maximal sum.

You need O(2**N*D) operations, where D is size of the dictionary.

Correction: we need to take into account all skeletons of size up to N (not only of words), and the numbet of operations will be O(2**N*C(L,N)), where L is the number of letters (maybe 26 in english).

Upvotes: 2

Related Questions