James
James

Reputation: 607

Get the largest substrings which are the same in both strings

I am trying to find the largest substrings which are in both strings (minimum length of 3). So if I have:

String test1 = "testthatthisworks";
String test2 = "testthisthat";

I need the answer to be:

String[] Answer = ["test", "that", "this"];

My one problem is this needs to be as fast as possible. My current solution is to with a substring of length 3 from the smallest string, and then to see if this exists in the bigger string, if it does increment the size of the substring, if not move the substring along 1 point. The problem is that this is very slow as the strings grow in length. Does anyone have a solution to this problem?

Thanks

Upvotes: 0

Views: 318

Answers (3)

BevynQ
BevynQ

Reputation: 26

This is a modification of the LCS algorithm, it will return all maximum length matches of maximum size:

public static Collection<String> longestCommonSubstrings(String S1, String S2){
  return longestCommonSubstrings(S1, S2, 0);
}

public static Collection<String> longestCommonSubstrings(String S1, String S2, int minimumLength){

Collection<Integer> indexes = new ArrayList<Integer>();
int Max = minimumLength;

for (int i = 0; i < S1.length(); i++){
  for (int j = 0; j < S2.length(); j++){
    int x = 0;
    int y = Math.min(S1.length()-i,S2.length()-j);
    while (x < y && (S1.charAt(i + x) == S2.charAt(j + x) )){
      x++;
    }
    if (x > Max){
      Max = x;
      indexes = new ArrayList<Integer>();
      indexes.add(i);
    }else if (x == Max){
      indexes.add(i);
    }
  }
}
Collection<String> results = new HashSet<String>();
for (Integer i : indexes){
  results.add(S1.substring(i, (i + Max)));
}
return results;
}

Upvotes: 1

Soufiane Hassou
Soufiane Hassou

Reputation: 17750

Your are looking for the Longest common substring

Java Implementation

Upvotes: 2

Juvanis
Juvanis

Reputation: 25950

Search for Longest Common Subsequence(LCS) problems and algorithms. You will get a lot of hints from the implementation of an algorithm of finding LCS of two strings. Here is an example: http://introcs.cs.princeton.edu/java/96optimization/LCS.java.html

If you trace an LCS algorithm carefully, it retrieves all the common substrings till it finds the longest one. Thus you can add some code to collect these substrings by checking their length, i.e. length > 3.

Upvotes: 1

Related Questions