Reputation: 1411
For some reason this piece of Java code is giving me overlapping matches:
Pattern pat = Pattern.compile("(" + leftContext + ")" + ".*" + "(" + rightContext + ")", Pattern.DOTALL);
any way/option so it avoids detecting overlaps? e.g. leftContext rightContext rightContext should be be 1 match instead of 2
Here's the complete code:
public static String replaceWithContext(String input, String leftContext, String rightContext, String newString){
Pattern pat = Pattern.compile("(" + leftContext + ")" + ".*" + "(" + rightContext + ")", Pattern.DOTALL);
Matcher matcher = pat.matcher(input);
StringBuffer buffer = new StringBuffer();
while (matcher.find()) {
matcher.appendReplacement(buffer, "");
buffer.append(matcher.group(1) + newString + matcher.group(2));
}
matcher.appendTail(buffer);
return buffer.toString();
}
So here's the final answer using a negative lookahead, my bad for not realizing * was greedy:
Pattern pat = Pattern.compile("(" +
leftContext + ")" + "(?:(?!" +
rightContext + ").)*" + "(" +
rightContext + ")", Pattern.DOTALL);
Upvotes: 4
Views: 1215
Reputation: 75222
Your use of the word "overlapping" is confusing. Apparently, what you meant was that the regex is too greedy, matching everything from the first leftContext
to the last rightContext
. It seems you figured that out already--and came up with a better approach as well--but there's still at least one potential problem.
You said leftContext
and rightContext
are "plain Strings", by which I assume you meant they aren't supposed to be interpreted as regexes, but they will be. You need to escape them, or any regex metacharacters they contain will cause incorrect results or run-time exceptions. The same goes for your replacement string, although only $
and the backslash have special meanings there. Here's an example (notice the non-greedy .*?
, too):
public static String replaceWithContext(String input, String leftContext, String rightContext, String newString){
String lcRegex = Pattern.quote(leftContext);
String rcRegex = Pattern.quote(rightContext);
String replace = Matcher.quoteReplacment(newString);
Pattern pat = Pattern.compile("(" + lcRegex + ").*?(" + rcRegex + ")", Pattern.DOTALL);
One other thing: if you aren't doing any post-match processing on the matched text, you can use replaceAll
instead of rolling your own with appendReplacement
and appendTail
:
return input.replaceAll("(?s)(" + lcRegex + ")" +
"(?:(?!" + rcRegex + ").)*" +
"(" + rcRegex + ")",
"$1" + replace + "$2");
Upvotes: 2
Reputation: 47183
There are few possibilities, depending on what you really need.
You can append $
at the end of your regex, like this:
"(" + leftContext + ")" + ".*" + "(" + rightContext + ")$"
so if rightContext
isn't the last thing, your regex won't match.
Next, you can capture everything after rightContext
:
"(" + leftContext + ")" + ".*" + "(" + rightContext + ")(.*)"
and after that discard everything in your third matching group.
But, since we don't know what leftContext
and rightContext
really are, maybe your problem lies within them.
Upvotes: 1