Reputation: 301
Is there some reason ++ works in a negative lookbehind while *+ does not?
Here's a MCV example:
select regexp_extract(
'the hay barn has cabins and hay.'
, concat(
'(?<!\\b(?:hay)\\b)'
,'\\s+'
,'('
,'\\b(?:antique|historic|bungalow|cabin|barn)\\b'
,')'
), 1),
'the hay barn has cabins and hay.'
For some reason in this case + is working but * is not?
Upvotes: 1
Views: 200
Reputation: 301
So one solution for this is to put a fixed width //s{0,x} in the negative lookbehind.
Upvotes: 0
Reputation: 626748
The (?<!\b(?:hay)\b)\s*(\b(?:antique|historic|bungalow|cabin|barn)\b)
pattern matches barn
in the hay barn has barns with cottages and hay
.
That happens because of backtracking. The (?<!\b(?:hay)\b)
lookbehind fails the match if there is a whole word hay
before the current position, so, the position after the first hay
is skipped and the regex engine goes on to check the position after the space. There is no whole word hay
immediately to the left of that location, so the lookbehind returns true. The number of whitespaces can be then 0 (due to *
), thus, the \s*
succeeds, as does the rest of the subsequent subpatterns.
Note that using a lookbehind before a quantified subpattern is not useful. You might use (?<!\bhay\s)(\b(?:antique|historic|bungalow|cabin|barn)\b)
(note a non-quantified \s
after \bhay
), but it will fail if there are 2 or more whitespaces before the expected match.
A more tangible work around is to use a regex that will match the words you need in the context with hay
and that will match and capture those words in all other contexts. Something like
\bhay\s*(?:antique|historic|bungalow|cabin|barn)\b|\b(antique|historic|bungalow|cabin|barn)\b
See another regex demo. All the matches you need will be in Group 1.
Upvotes: 1