why does this regex pattern match this string?

Pattern:

"d`?(?!([\\s]*<-))"

String:

"d` <-"

According to R, this is a match:

> grepl("d`?(?!([\\s]*<-))", "d` <-", perl = TRUE)
[1] TRUE

That doesn't make sense to me since the d matches, the (0 or 1) backtick matches. But the " <-" should not be a match?

Upvotes: 2

Views: 144

Answers (2)

anubhava
anubhava

Reputation: 786081

Due to optional match of `?, it is satisfying negative lookahead.

In PERL mode you may use this regex with a possessive quantifier that doesn't backtrack:

d`?+(?!\s*<-)

RegEx Demo

Here back-tick or ` is matched optionally but use of ?+ makes it possessive in nature that doesn't allow backtracking hence negative lookahead fails the match.

Upvotes: 1

Ryszard Czech
Ryszard Czech

Reputation: 18641

The reason is backtracking, as explained in Wiktor Stribizew's comment.

Add the optional backtick into the lookahead and move the lookahead after d:

d(?!`?(\s*<-))`?

See proof

Explanation

  [a-z]                    any character of: 'a' to 'z'
--------------------------------------------------------------------------------
  \d{2,4}                  digits (0-9) (between 2 and 4 times
                           (matching the most amount possible))
--------------------------------------------------------------------------------
  (?:                      group, but do not capture (optional
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
    -                        '-'
--------------------------------------------------------------------------------
    [a-z]                    any character of: 'a' to 'z'
--------------------------------------------------------------------------------
    \d{2,4}                  digits (0-9) (between 2 and 4 times
                             (matching the most amount possible))
--------------------------------------------------------------------------------
  )?                       end of grouping
--------------------------------------------------------------------------------
  (?:                      group, but do not capture (optional
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
    -                        '-'
--------------------------------------------------------------------------------
    [a-z]                    any character of: 'a' to 'z'
--------------------------------------------------------------------------------
  )?                       end of grouping

Upvotes: 1

Related Questions