Reputation: 81
My goal is to delete all matches from an input using a regular expression with Java 7:
input.replaceAll([regex], "");
Given this example input
with a target string abc-
:
<TAG>test-test-abc-abc-test-abc-test-</TAG>test-abc-test-abc-<TAG>test-abc-test-abc-abc-</TAG>
What regex could I use in the code above to match abc-
only when it is between the <TAG>
and </TAG>
delimiters? Here is the desired matching behaviour, with <-->
for a match:
<--><--> <--> <--> <--><-->
<TAG>test-test-abc-abc-test-abc-test-</TAG>test-abc-test-abc-<TAG>test-abc-test-abc-abc-</TAG>
Expected result:
<TAG>test-test-test-test-</TAG>test-abc-test-abc-<TAG>test-test-</TAG>
The left and right delimiters are always different. I am not particularly looking for a recursive solution (nested delimiters).
I think this might be doable with lookaheads and/or lookbehinds but I didn't get anywhere with them.
Upvotes: 0
Views: 76
Reputation: 626870
You can use a regex like
(?s)(\G(?!^)|<TAG>(?=.*?</TAG>))((?:(?!<TAG>|</TAG>).)*?)abc-
See the regex demo. Replace with $1$2
. Details:
(?s)
- a Pattern.DOTALL
embedded flag option(\G(?!^)|<TAG>(?=.*?</TAG>))
- Group 1 ($1
): either of the two:
\G(?!^)
- end of the previous successful match|
- or<TAG>(?=.*?</TAG>)
- <TAG>
that is immediately followed with any zero or more chars, as few as possible, followed with </TAG>
(thus, we make sure there is actually the closing, right-hand delimiter further in the string)((?:(?!<TAG>|</TAG>).)*?)
- Group 2 ($2
): any one char (.
), zero or more repetitions, but as few as possible (*?
) that does not start a <TAG>
or </TAG>
char sequences (aka tempered greedy token)abc-
- the pattern to be removed, abc-
.In Java:
String pattern = "(?s)(\\G(?!^)|<TAG>(?=.*?</TAG>))((?:(?!<TAG>|</TAG>).)*?)abc-";
String result = text.replaceAll(pattern, "$1$2");
Upvotes: 1