user48944
user48944

Reputation: 301

Regex \\s* vs \\s+ After Lookbehind

Is there some reason ++ works in a negative lookbehind while *+ does not?

Here's a MCV example:

select regexp_extract(
               'the hay   barn has cabins and hay.'
                , concat(
                '(?<!\\b(?:hay)\\b)'
                ,'\\s+'
                ,'('
                ,'\\b(?:antique|historic|bungalow|cabin|barn)\\b'
                ,')'
                ), 1),
                'the hay   barn has cabins and hay.'

For some reason in this case + is working but * is not?

Upvotes: 1

Views: 200

Answers (2)

user48944
user48944

Reputation: 301

So one solution for this is to put a fixed width //s{0,x} in the negative lookbehind.

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626748

The (?<!\b(?:hay)\b)\s*(\b(?:antique|historic|bungalow|cabin|barn)\b) pattern matches barn in the hay barn has barns with cottages and hay.

That happens because of backtracking. The (?<!\b(?:hay)\b) lookbehind fails the match if there is a whole word hay before the current position, so, the position after the first hay is skipped and the regex engine goes on to check the position after the space. There is no whole word hay immediately to the left of that location, so the lookbehind returns true. The number of whitespaces can be then 0 (due to *), thus, the \s* succeeds, as does the rest of the subsequent subpatterns.

Note that using a lookbehind before a quantified subpattern is not useful. You might use (?<!\bhay\s)(\b(?:antique|historic|bungalow|cabin|barn)\b) (note a non-quantified \s after \bhay), but it will fail if there are 2 or more whitespaces before the expected match.

A more tangible work around is to use a regex that will match the words you need in the context with hay and that will match and capture those words in all other contexts. Something like

\bhay\s*(?:antique|historic|bungalow|cabin|barn)\b|\b(antique|historic|bungalow|cabin|barn)\b

See another regex demo. All the matches you need will be in Group 1.

Upvotes: 1

Related Questions