Reputation: 93
I am very new in programming and I am trying to solve one of longest common sequence/substring problems in Java. So the algorithm question I am working on is to find longest common substring without cutting words.
For instance: given string1 = He had 3 pennies and 5 quarters
and string2 = q3nniesp
should return pennies
.
Other example: string1 = They named the new place face cafe
and string2 = e face
and output will be e face cafe
.
I am trying to figure this algorithm out but I can't decide if I need to convert these to char array or evaluate it as string. The way both strings can have spaces are confusing me.
I followed some of the existing stackoverflow questions and trying to modify this code from https://www.geeksforgeeks.org/
:
static String findLongestSubsequence(String str1, String str2) {
char[] A = str1.toCharArray();
char[] B = str2.toCharArray();
if (A == null || B == null) return null;
final int n = A.length;
final int m = B.length;
if (n == 0 || m == 0) return null;
int[][] dp = new int[n+1][m+1];
// Suppose A = a1a2..an-1an and B = b1b2..bn-1bn
for (int i = 1; i <= n; i++ ) {
for (int j = 1; j <= m; j++) {
// If ends match the LCS(a1a2..an-1an, b1b2..bn-1bn) = LCS(a1a2..an-1, b1b2..bn-1) + 1
if (A[i-1] == B[j-1]) dp[i][j] = dp[i-1][j-1] + 1;
// If the ends do not match the LCS of a1a2..an-1an and b1b2..bn-1bn is
// max( LCS(a1a2..an-1, b1b2..bn-1bn), LCS(a1a2..an-1an, b1b2..bn-1) )
else dp[i][j] = Math.max(dp[i-1][j], dp[i][j-1]);
}
}
int lcsLen = dp[n][m];
char[] lcs = new char[lcsLen];
int index = 0;
// Backtrack to find a LCS. We search for the cells
// where we included an element which are those with
// dp[i][j] != dp[i-1][j] and dp[i][j] != dp[i][j-1])
int i = n, j = m;
while (i >= 1 && j >= 1) {
int v = dp[i][j];
// The order of these may output different LCSs
while(i > 1 && dp[i-1][j] == v) i--;
while(j > 1 && dp[i][j-1] == v) j--;
// Make sure there is a match before adding
if (v > 0) lcs[lcsLen - index++ - 1] = A[i-1]; // or B[j-1];
i--; j--;
}
return new String(lcs, 0, lcsLen);
}
But I keep getting wrong outputs. For instance first output gives output = 3nnies
, I am really stuck at this point, can anybody gives a hand or a little scratch of algorithm? Thank y'all.
Upvotes: 2
Views: 378
Reputation: 4090
I tried out your original algorithm, unfortunately, it was not in the right direction.
I am under the assumptions that the following guidelines apply:
I, therefore, took the liberty of using a brute force algorithm while using java streams:
// complexity of O(N*M), where N is the length of the original string and M is the length of the substring
static String longestCommonSequence(String string, String sub) {
List<Character> primaryMatch = new ArrayList<>();
List<Character> secondaryMatch = new ArrayList<>();
// N iterations loop on original string
for(char c : string.toCharArray()) {
// M iterations loop on the substring
if(sub.indexOf(c) != -1) {
primaryMatch.add(c);
}
else {
if(!primaryMatch.isEmpty()) {
// replace secondaryMatch content with the current longet match
if(secondaryMatch.size() <= primaryMatch.size()) {
secondaryMatch.clear();
secondaryMatch.addAll(primaryMatch);
}
primaryMatch.clear();
}
}
}
if(primaryMatch.size() < secondaryMatch.size()) {
return secondaryMatch.stream().map(String::valueOf).collect(Collectors.joining());
}
return primaryMatch.stream().map(String::valueOf).collect(Collectors.joining());
}
The outputs for the inputs you provided are:
string1 = He had 3 pennies and 5 quarters and string2 = q3nniesp ---> pennies
string1 = They named the new place face cafe and string2 = e face ---> ace face cafe
Note the difference in the second output - based on the output behavior you described the result of the above algorithm is the correct one, since ace face cafe
is a longer than e face cafe
, as multiple usages of characters from the given substring are allowed.
Note a subtle issue in the algorithm:
if(secondaryMatch.size() <= primaryMatch.size())
The current implementation will, in case of several matching substrings of the same length, return the last match (based on the order of characters in the original string). If you wish to return the first match, replace the <=
with <
.
If the assumptions I described are not allowed - please comment on this answer and I will update it according to your specifications.
Upvotes: 1