user2675364
user2675364

Reputation: 119

Given a list of words, how do you find common letters that overlap

I have an input list of words. You check the suffix of the first word to the prefix of the next word.

Eg.

serene next tango extra

{serene,next}= 2common letters    {serene,tango}=0   {serene,extra}= 1
{next,serene}= 0      {next,tango}= 1   {next,extra}= 3
{tango,serene}=0       {tango,next}= 0  {tango,extra}= 0
{extra,serene}=0        {extra,next}=0   {extra,tango}=0   

You can also switch the order of the words i.e.(next, serene) if overlap letter score is better this way

so you check the overlap scores with each word and finally return the list of words with maximal score

Going by the input list the score is 1 serene,next,tango,extra = 1

Maximal Score is = 5 and the output list returned would be the following:

serine,next,extra,tango

serene,next= 2common letters    serene,tango=0   serene,extra= 1
next,serene= 0                  next,tango= 1   next,extra= 3
tango,serene=0                  tango,next= 0  tango,extra= 0
extra,serene=0                   extra,next=0   extra,tango=0   

What is the best way to calculate overlap score and return maximal score list in terms of complexity?

I am only able to calculate the overlap score for consecutive words, but that doesn't give maximal score.

Upvotes: 2

Views: 1383

Answers (2)

Serge Ballesta
Serge Ballesta

Reputation: 148965

I am not sure it is the most efficient approach, but I would compute the matrix of scores for any two consecutive words, and then simply use backtrack to find the longest possible chain.

Backtracking has a poor efficiency reputation, but in current use case, I think it can be used, because we can stop analyze as soon as 2 words have a score of 0. So I can find the correct maximal score 5 and the best sequence in 11 operations.

Code :

public class Overlap {
    int[][] matrix;
    int total;
    int [] bestSeq;
    String[] strings;

    /**
     * @param args the command line arguments
     */
    public static void main(String[] strings) {
        // TODO code application logic here
        Overlap overlap = new Overlap(strings);
        int score = overlap.compute();
        System.out.println("Best score : " + score);
        for (int i : overlap.bestSeq) {
            System.out.print(" " + strings[i]);
        }
        System.out.println(" in " + overlap.total + " operations");

    }

    public Overlap(String[] strings) {
        this.strings = strings;
        matrix = matrix(strings);
        bestSeq = new int[strings.length];
    }

    int compute() {
        total = 0;
        int[] sequence = new int[strings.length];
        for (int i=0; i < strings.length; i++) {
            sequence[i] = i;
        }
        return this.bestSequence(-1, sequence, bestSeq);
    }

    static int findOverlap(String a, String b) {
        int la = a.length();
        int l = Math.min(la, b.length());
        while (l > 0) {
            if (a.substring(la - l).equals(b.substring(0, l))) {
                return l;
            }
            l--;
        }
        return 0;
    }

    static int[][] matrix(String[] strings) {
        int l = strings.length;
        int[][] mx = new int[l][l];
        for (int i = 0; i < l - 1; i++) {
            for (int j = i + 1; j < l; j++) {
                mx[i][j] = findOverlap(strings[i], strings[j]);
            }
        }
        return mx;
    }

    int bestSequence(int initial, int[] sequence, int[] best) {
        total += 1;
        int max = 0;
        if (best.length != sequence.length) {
            throw new java.lang.IllegalArgumentException();
        }
        int l = sequence.length;
        int[] newseq = new int[l - 1];
        int[] newbest = new int[l - 1];
        for (int i : sequence) {
            int val = (initial == -1) ? 0 : matrix[initial][i];
            if ((val > 0) || (initial == -1)) {
                int k = 0;
                for (int j : sequence) {
                    if (j != i) {
                        newseq[k++] = j;
                    }
                }
                val += bestSequence(i, newseq, newbest);
                if (val > max) {
                    max = val;
                    best[0] = i;
                    System.arraycopy(newbest, 0, best, 1, l - 1);
                }
            }
        }
        if (max == 0) {
            System.arraycopy(sequence, 0, best, 0, l);
        }
        return max;
    }
}

With the arguments serene next tango extra, it prints :

Best score : 5
 serene next extra tango
in 11 operations

Upvotes: 0

StackFlowed
StackFlowed

Reputation: 6816

You can add all the letters in a list and then do retainAll like:

String one="next", two="extra";
List<Character> oneList=new ArrayList<Character>();
for(Character c : one.toCharArray()) {
    oneList.add(c);
}
List<Character> twoList=new ArrayList<Character>();
for(Character c : two.toCharArray()) {
    twoList.add(c);
}
List<Character> finalList = new ArrayList<Character>(oneList);
finalList.retainAll(twoList);
System.out.print("There are "+finalList.size()+ " letters in common and they are : ");
for(Character c: finalList){
    System.out.print(c+" ");
}

Unfortunately I don't know a better way to convert primitive data type into list other that using Google Guava library or other 3 party API's. If you want to optimize the code then looking them.

Upvotes: 1

Related Questions