Margarita
Margarita

Reputation: 1303

Identify 5 "forbidden" characters that result in *fewest* exclusions from word list

From "Think Python" - The author provides word filtering exercises (tasks are to include/exclude words from a list based on minimum length, characters required, or forbidden, etc.)

An extra question he includes: Can you find a combination of 5 forbidden letters that excludes the smallest number of words? (I found topics here and elsewhere generally related to above exercises, but not an algorithm/answer for this extra question.) Here's my start in working it out, and where I got stuck:

  1. For each character in the word list identify the number of words it occupies

  2. Build a dictionary with each key = to a given character; each key value = total number of words occupied by that character.

  3. Sort by value to identify the 5 characters, in ascending order, that occupy the least number of words.

I'm a bit stuck at this point - because if characters jointly occur in some words in various combinations, that can reduce the total number of words they cause to get excluded from this list.

I wasn't sure how to follow that reasoning to 'abstract' the problem and figure out a general solution. Any pointers?

Upvotes: 2

Views: 327

Answers (1)

M Oehm
M Oehm

Reputation: 29126

Your approach will find an upper bound for the set of forbidden characters. You can use sets and unions of sets to find out whether there is a set of characters that is better than your upper-bound set.

The following approach should work, but it will create large sets:

  • Create a dictionary with the 26 letters as keys and with an empty set as value. Read the words and add them to the sets for the letters that they contain.

  • Find the letters with the five smallest word sets. The sum of the set lengths for these letters is your upper bound. Filter all letters whose sets are larger than that upper bound out of the dictionary.

  • Now find the union of all combinations of five of the remaining letters and find the one whose union is smallest. You can do that recursively.

Upvotes: 1

Related Questions