Reputation: 122450
Given 2 strings, I want to find the first match of at least four characters.
This is the code I currently have to do it. It works correctly, but I'm thinking there may be a better way to do this. Are there any screaming inefficiencies or bad practices in what I'm doing? Are there common libraries, like Apache Commons, that I should be taking advantage of but I'm not?
Don't worry about the Gene
class - it just contains the String in question. Also - GeneMatch()
signifies no match exists, whereas the GeneMatch
constructor with arguments signifies a match has been found.
Constants.MIN_MATCH
== 4, in this case.
public static GeneMatch findMatch(Gene g0, Gene g1) {
String g0DNA = g0.getDNA();
String g1DNA = g1.getDNA();
if (g0DNA.equals("") || g1DNA.equals("")) { //there won't be a match if one is empty
return new GeneMatch();
}
int g0Left = -1;
int g0Right = -1;
int g1Left = -1;
int g1Right = -1;
String window;
for (int inx = 0; inx <= g0DNA.length() - Constants.MIN_MATCH; inx++) {
window = g0DNA.substring(inx, inx + Constants.MIN_MATCH);
if (g1DNA.indexOf(window) != -1) {
g0Left = inx;
g0Right = inx + Constants.MIN_MATCH;
g1Left = g1DNA.indexOf(window);
g1Right = g1Left + Constants.MIN_MATCH;
/* grow the match to the right
* while the two right indices are less than the lengths of their respective strings, and the
* characters at the indices match, increment each index
*/
while (g0Right < g0DNA.length() && g1Right < g1DNA.length() && g0DNA.charAt(g0Right) == g1DNA.charAt(g1Right)) {
g0Right++;
g1Right++;
}
break; //we've already found a match, no need to continue sliding the window
}
}
//now that the indices are found, convert to Genes
if (g0Left == -1 || g0Right == -1 || g1Left == -1 || g1Right == -1) { //no match found
return new GeneMatch();
}
Gene gL0 = new Gene(g0DNA.substring(0, g0Left));
Gene gL1 = new Gene(g1DNA.substring(0, g1Left));
Gene g0match = new Gene(g0DNA.substring(g0Left, g0Right));
Gene g1match = new Gene(g1DNA.substring(g1Left, g1Right));
Gene gR0 = new Gene(g0DNA.substring(g0Right));
Gene gR1 = new Gene(g1DNA.substring(g1Right));
//sanity check
assert g0DNA.equals(gL0.getDNA() + g0match.getDNA() + gR0.getDNA()) : "g0 didn't add up";
assert g1DNA.equals(gL1.getDNA() + g1match.getDNA() + gR1.getDNA()) : "g1 didn't add up";
return new GeneMatch(gL0, gR0, g0match, g1match, gL1, gR1);
}
Upvotes: 0
Views: 560
Reputation: 1292
Looks good to me. One might go ahead and micro optimize in terms of assignments, but this is the job of the JIT compiler. If you feel the algorithm is too slow, try to profile it.
Upvotes: 0
Reputation: 3744
Current approach
Improvements
General consideration There is no 'the one best algorithm' because 'the best' selection depends on input data profile and algorithm usage policy. I.e. if the algorithm is executed rarely and its performance impact is insignificant there is no point in spending a lot of time to its optimization and much better to write a simple code that is easy to maintain. If input strings are rather short there is no point in building characters index etc. In general just try to avoid preliminary optimization whenever possible and carefully consider all input data during choosing resulting algorithm if you really have a bottleneck there.
Upvotes: 2
Reputation: 36095
Looks quite okay to me. Just two minor things:
reuse the result of g1DNA.indexOf(window)
instead of calling it twice (g1Left = g1DNA.indexOf(window);
)
you don't have to check all 4 vars for being == -1 as you all set them at once anyway.
Upvotes: 1