Reputation: 8434
Say I have a tag <tag>
and I want to match groups of <tag>...<tag>
in my string. I can use a regular expression along the lines of <tag>.*<tag>
. This matches <tag>foo<tag>
, which is good, but it also matches <tag>foo<tag>bar<tag>
, which is behavior I don't want. I want the <tag>foo<tag>
to be matched, then bar
to be excluded, and then the tag on the end to be the start of the next match. How do I do this?
Upvotes: 1
Views: 875
Reputation: 336378
The simplest solution is to use a lazy quantifier where the ?
forces the .*
to match as few characters as possible (and not as many as possible, as the unadorned .*
will try to match):
<tag>.*?<tag>
A safer, more explicit solution is to use a negative lookahead assertion:
<tag>(?:(?!<tag>).)*<tag>
While in the current case, there is no difference in behavior, the second one is extendable to handle open/close tags, making sure that nested tags aren't incorrectly matched:
<tag>(?:(?!</?tag>).)*</tag>
when applied to <tag>foo<tag>bar</tag>baz</tag>
will match <tag>bar</tag>
, and not <tag>foo<tag>bar</tag>
as a solution with a lazy quantifier would.
Upvotes: 7
Reputation: 71578
You use a lazy version of .*
being:
<tag>.*?<tag>
^
The ?
makes the .*
match up to until the first match of <tag>
.
Upvotes: 2