geotheory
geotheory

Reputation: 23630

Regex inconsistency when using parentheses

Can anyone possibly help me understand why the following:

require(stringr)
x = "The quick brown fox jumps over the lazy dog"
str_detect(x, 'dog')
#> [1] TRUE
str_detect(x, '(?=dog)')
#> [1] TRUE
str_detect(x, '(?=quick)(?=dog)') # fails why?
#> [1] FALSE
str_detect(x, '(?=quick)(?=.*dog)')
#> [1] TRUE

Upvotes: 1

Views: 161

Answers (1)

akuiper
akuiper

Reputation: 214927

From the documentation, look ahead and look behind:

are zero-length assertions; They do not consume characters in the string, but only assert whether a match is possible or not.

So the regex (?=quick)(?=dog) will firstly match with (?=quick):

The quick brown fox jumps over the lazy dog
   ^^  # this position

And since it doesn't consume characters, the position stays right before quick after the match, and continue to match the next pattern (?=dog) which fails because this is not true, actually you will never find a position that is followed by both quick and dog;

You will find this works if one of the patterns is the prefix of the other, like quick and qui:

x = "The quick brown fox jumps over the lazy dog"
str_detect(x, '(?=quick)(?=qui)')
# [1] TRUE

(?=quick)(?=.*dog) on the other hand tries to find (?=.*dog) at the position after matching (?=quick):

The quick brown fox jumps over the lazy dog
   ^^  # this position

Which assert TRUE since quick brown fox jumps over the lazy dog can match .*dog.

Upvotes: 4

Related Questions