Kevin Eder
Kevin Eder

Reputation: 309

Regex negative lookbehinds with a wildcard

I'm trying to match some text if it does not have another block of text in its vicinity. For example, I would like to match "bar" if "foo" does not precede it. I can match "bar" if "foo" does not immediately precede it using negative look behind in this regex:

/(?<!foo)bar/

but I also like to not match "foo 12345 bar". I tried:

/(?<!foo.{1,10})bar/

but using a wildcard + a range appears to be an invalid regex in Ruby. Am I thinking about the problem wrong?

Upvotes: 9

Views: 6245

Answers (2)

sawa
sawa

Reputation: 168091

As m.buettner already mentions, lookbehind in Ruby regex has to be of fixed length, and is described so in the document. So, you cannot put a quantifier within a lookbehind.

You don't need to check all in one step. Try doing multiple steps of regex matches to get what you want. Assuming that existence of foo in front of a single instance of bar breaks the condition regardless of whether there is another bar, then

string.match(/bar/) and !string.match(/foo.*bar/)

will give you what you want for the example.

If you rather want the match to succeed with bar foo bar, then you can do this

string.scan(/foo|bar/).first == "bar"

Upvotes: 4

Martin Ender
Martin Ender

Reputation: 44259

You are thinking about it the right way. But unfortunately lookbehinds usually have be of fixed-length. The only major exception to that is .NET's regex engine, which allows repetition quantifiers inside lookbehinds. But since you only need a negative lookbehind and not a lookahead, too. There is a hack for you. Reverse the string, then try to match:

/rab(?!.{0,10}oof)/

Then reverse the result of the match or subtract the matching position from the string's length, if that's what you are after.

Now from the regex you have given, I suppose that this was only a simplified version of what you actually need. Of course, if bar is a complex pattern itself, some more thought needs to go into how to reverse it correctly.

Note that if your pattern required both variable-length lookbehinds and lookaheads, you would have a harder time solving this. Also, in your case, it would be possible to deconstruct your lookbehind into multiple variable length ones (because you use neither + nor *):

/(?<!foo)(?<!foo.)(?<!foo.{2})(?<!foo.{3})(?<!foo.{4})(?<!foo.{5})(?<!foo.{6})(?<!foo.{7})(?<!foo.{8})(?<!foo.{9})(?<!foo.{10})bar/

But that's not all that nice, is it?

Upvotes: 13

Related Questions