Kvass
Kvass

Reputation: 8434

Ruby Regular Expression - prevent overlapping matches

Say I have a tag <tag> and I want to match groups of <tag>...<tag> in my string. I can use a regular expression along the lines of <tag>.*<tag>. This matches <tag>foo<tag>, which is good, but it also matches <tag>foo<tag>bar<tag>, which is behavior I don't want. I want the <tag>foo<tag> to be matched, then bar to be excluded, and then the tag on the end to be the start of the next match. How do I do this?

Upvotes: 1

Views: 875

Answers (2)

Tim Pietzcker
Tim Pietzcker

Reputation: 336378

The simplest solution is to use a lazy quantifier where the ? forces the .* to match as few characters as possible (and not as many as possible, as the unadorned .* will try to match):

<tag>.*?<tag>

A safer, more explicit solution is to use a negative lookahead assertion:

<tag>(?:(?!<tag>).)*<tag>

While in the current case, there is no difference in behavior, the second one is extendable to handle open/close tags, making sure that nested tags aren't incorrectly matched:

<tag>(?:(?!</?tag>).)*</tag>

when applied to <tag>foo<tag>bar</tag>baz</tag> will match <tag>bar</tag>, and not <tag>foo<tag>bar</tag> as a solution with a lazy quantifier would.

Upvotes: 7

Jerry
Jerry

Reputation: 71578

You use a lazy version of .* being:

<tag>.*?<tag>
       ^

The ? makes the .* match up to until the first match of <tag>.

Upvotes: 2

Related Questions