Reputation: 23630
Can anyone possibly help me understand why the following:
require(stringr)
x = "The quick brown fox jumps over the lazy dog"
str_detect(x, 'dog')
#> [1] TRUE
str_detect(x, '(?=dog)')
#> [1] TRUE
str_detect(x, '(?=quick)(?=dog)') # fails why?
#> [1] FALSE
str_detect(x, '(?=quick)(?=.*dog)')
#> [1] TRUE
Upvotes: 1
Views: 161
Reputation: 214927
From the documentation, look ahead and look behind:
are zero-length assertions; They do not consume characters in the string, but only assert whether a match is possible or not.
So the regex (?=quick)(?=dog)
will firstly match with (?=quick)
:
The quick brown fox jumps over the lazy dog
^^ # this position
And since it doesn't consume characters, the position stays right before quick after the match, and continue to match the next pattern (?=dog)
which fails because this is not true, actually you will never find a position that is followed by both quick
and dog
;
You will find this works if one of the patterns is the prefix of the other, like quick
and qui
:
x = "The quick brown fox jumps over the lazy dog"
str_detect(x, '(?=quick)(?=qui)')
# [1] TRUE
(?=quick)(?=.*dog)
on the other hand tries to find (?=.*dog)
at the position after matching (?=quick)
:
The quick brown fox jumps over the lazy dog
^^ # this position
Which assert TRUE
since quick brown fox jumps over the lazy dog
can match .*dog
.
Upvotes: 4