Reputation: 629
I'm specifically using Ruby but I'm curious... say I'm trying to match a decimal followed by at least three digits.
Here's the regexp: /(\.\d{5,})/
Without using a negative lookbehind, how would I make this only match if it follows either A) a space or tab or newline, or B) is the start of a string?
Upvotes: 2
Views: 3997
Reputation: 29677
Let's first consider how it would be done with a lookbehind. Then we just check if before what we capture is the start of the line, or a whitespace:
(?<=^|\s)(\.\d{5,})
We could simply change that lookbehind to a normal capture group.
Which means a preceding whitespace also gets captured. But in a replace we can just use or not use that capture group 1.
(^|\s)(\.\d{5,})
In the PCRE regex engine we have \K
\K : resets the starting point of the reported match. Any previously consumed characters are no longer included in the final match
So by using that \K in the regex, the preceding space isn't included in the match
(?:^|\s)\K(\.\d{5,})
A test here
However, if you use Rubi's scan
with a regex that has capture groups?
Then it seems that it only outputs the capture groups (...)
, but not the non-capture groups (?:...)
or what's not in a capture group.
For example:
m = '.12345 .123456 NOT.1234567'.scan(/(?:^|\s)(\.\d{5,})/)
=> [[".12345"], [".123456"]]
m = 'ab123cd'.scan(/[a-z]+(\d+)(?:[a-z]+)/)
=> [["123"]]
So when you use scan, lookarounds don't need to be used.
Upvotes: 4