Kyle Asaff
Kyle Asaff

Reputation: 274

Using pairs of words from a dictionary with no letters in common, find a pair that maximizes the sum of the words' lengths

Question:

Using pairs of words from a dictionary with no letters in common, find a pair that maximizes the sum of the words' lengths

Example Dictionary: mouse, cow, join, key, dog

dog and key share no letters and have a sum of 3+3 = 6

mouse does not work with cow, join, or dog because they all share the letter 'o'

join and key share no letters and have a sum of 4+3 = 7

I had this question in an interview, my solution I came up with is outlined below. I was wondering if there is any way to make it more efficient? I used two BitSets to map the alphabet of two words and AND them together to see if they contain the same letters. I think my algorithm has a complexity is o(n!) which is inefficient, is there a better way to optimize my algorithm?

public static void maximumSum (String[] dictionary) {
    // ascii of a = 97
    BitSet word1 = new BitSet(26);
    BitSet word2 = new BitSet(26);

    String maxWord1 = "";
    String maxWord2 = "";
    int maxSum = -1;

    for(int i = 0; i<dictionary.length; i++) {
        for(int j = i+1; j<dictionary.length; j++) {
            String s1 = dictionary[i];
            String s2 = dictionary[j];
            for(int k = 0; k<s1.length(); k++) {
                word1.set(s1.charAt(k)-97);
            }
            for(int k = 0; k<s2.length(); k++) {
                word2.set(s2.charAt(k)-97);
            }
            word1.and(word2);
            if(word1.cardinality() == 0) {
                if(maxSum < s1.length()+s2.length()) {
                    maxWord1 = s1;
                    maxWord2 = s2;
                    maxSum = s1.length()+s2.length();
                }
            }
            word1.clear();
            word2.clear();
        }
    }
    if(maxSum == -1)
        System.out.println("All the words have letters in common.");
    else
        System.out.println("'"+maxWord1+"' and '"+maxWord2+"' 
        have a maximum sum of "+maxSum);
}

public static void main(String[] args) {
    String[] dictionary = {"mouse", "cow", "join", "key", "dog"};
    maximumSum(dictionary);
}

output:

'join' and 'key' have a maximum sum of 7

Upvotes: 4

Views: 850

Answers (2)

Douglas Zare
Douglas Zare

Reputation: 3316

Here is an approach that could be called O(n*avg), though perhaps it should be called O(n*avg+a*2^a) where a is the size of the alphabet, n is the number of words, and avg is the average length of the words in the dictionary, in case you might do this with larger alphabets. It would not be an improvement over the brute force method mentioned by Bogdan Pop if your dictionary has only a few thousand words, but it would be an improvement when the dictionary contains hundreds of thousands of words. For comparison, the SOWPODS dictionary used by some competitive Scrabble players contains over 267000 words. Most of this approach, before the last paragraph, is essentially the same as Michal Forišek's answer on Quora.

For each word w, determine its length l(w) and which letters it contains s(w).

For each of the 2^a subsets of the alphabet, determine the longest word containing exactly those letters by iterating through each word w, and replacing the stored length in s(w) with the max of itself and l(w).

For each of the 2^a subsets of the alphabet, determine the longest word containing at most those letters. This can be done in O(a*2^a) steps because each subset of size k covers k smaller subsets with k-1 letters. We compute the value inductively as the longest word containing exactly those letters or contained in one of the k-1 covered subsets.

Now we can iterate over all subsets of the alphabet summing the length of the longest word using exactly those letters, and the length of the longest word contained in the complement of that subset.

One unsatisfactory part of this approach is that the time increases a lot due to the inclusion of uncommon letters. The time taken increases roughly by a factor of 20 due to the Q, X, J, and Z, which aren't that common in words in general, and I think not in long words, either. So, suppose there are r words containing at least one of b rare letters. We can use the above algorithm on words with no rare letters in O(n*avg+(a-b)2^(a-b)) steps. Then we can brute-force pairs of words that both contain rare letters in O(r^2*a) steps. Finally, we can check for pairs of words where the first contains a rare letter but the second one doesn't by iterating through the r words containing a rare letter, checking the longest word with only ordinary letters in the complement. This last case takes O(r*avg) steps so it is asymptotically negligible. So, by choosing a few rare letters, we can reduce the time to O(n*avg+(a-b)2^(a-b)+r^2*avg). At least, with a standard English dictionary, there are rarely used letters so this would be a reduction.

Upvotes: 0

Bogdan Pop
Bogdan Pop

Reputation: 438

You can do this in O(N^2 * 26) (26 represents the number of characters in your dictionary, english alphabet probably).

FIRST Build a 2D bool array, D[i][j].

i - integer

j - character from 'a' to 'z' ( you can have the ASCII code instead of character )

D[i][j] = 1 if the word which is at index i contains the letter j, else D[i][j] = 0;

After you have this 2D array you can check for every pair of words if they have a letter in common ( you iterate for every pair, every letter of your dictionary). If they don't , you actualize the maximum sum.

Upvotes: 0

Related Questions