Reputation: 607
I am trying to find the largest substrings which are in both strings (minimum length of 3). So if I have:
String test1 = "testthatthisworks";
String test2 = "testthisthat";
I need the answer to be:
String[] Answer = ["test", "that", "this"];
My one problem is this needs to be as fast as possible. My current solution is to with a substring of length 3 from the smallest string, and then to see if this exists in the bigger string, if it does increment the size of the substring, if not move the substring along 1 point. The problem is that this is very slow as the strings grow in length. Does anyone have a solution to this problem?
Thanks
Upvotes: 0
Views: 318
Reputation: 26
This is a modification of the LCS algorithm
, it will return all maximum length matches
of maximum size:
public static Collection<String> longestCommonSubstrings(String S1, String S2){
return longestCommonSubstrings(S1, S2, 0);
}
public static Collection<String> longestCommonSubstrings(String S1, String S2, int minimumLength){
Collection<Integer> indexes = new ArrayList<Integer>();
int Max = minimumLength;
for (int i = 0; i < S1.length(); i++){
for (int j = 0; j < S2.length(); j++){
int x = 0;
int y = Math.min(S1.length()-i,S2.length()-j);
while (x < y && (S1.charAt(i + x) == S2.charAt(j + x) )){
x++;
}
if (x > Max){
Max = x;
indexes = new ArrayList<Integer>();
indexes.add(i);
}else if (x == Max){
indexes.add(i);
}
}
}
Collection<String> results = new HashSet<String>();
for (Integer i : indexes){
results.add(S1.substring(i, (i + Max)));
}
return results;
}
Upvotes: 1
Reputation: 25950
Search for Longest Common Subsequence(LCS) problems and algorithms. You will get a lot of hints from the implementation of an algorithm of finding LCS of two strings. Here is an example: http://introcs.cs.princeton.edu/java/96optimization/LCS.java.html
If you trace an LCS algorithm carefully, it retrieves all the common substrings till it finds the longest one. Thus you can add some code to collect these substrings by checking their length, i.e. length > 3.
Upvotes: 1