Bram Van Rensbergen
Bram Van Rensbergen

Reputation: 387

R - why does str_detect return a different result than grepl when using word boundary on 'words' ending with dash

The help page for str_detect states "Equivalent to grepl(pattern, x)", however:

str_detect("ALL-", str_c("\\b", "ALL-", "\\b"))
[1] FALSE

While

grepl(str_c("\\b", "ALL-", "\\b"), "ALL-")
[1] TRUE

I imagine one of these is not working as intended? Or am I missing something?

Upvotes: 5

Views: 395

Answers (1)

Jaccar
Jaccar

Reputation: 1854

When you add the argument perl = TRUE to grepl(), it gives the same result:

> grepl(str_c("\\b", "ALL-", "\\b"), "ALL-")
[1] TRUE
> grepl(str_c("\\b", "ALL-", "\\b"), "ALL-", perl = T)
[1] FALSE

This argument means grepl() will use Perl Compatible Regex.

There is this warning in ?grep, which might be related?

The POSIX 1003.2 mode of gsub and gregexpr does not work correctly with repeated word-boundaries (e.g., pattern = "\b"). Use perl = TRUE for such matches (but that may not work as expected with non-ASCII inputs, as the meaning of ‘word’ is system-dependent).

Upvotes: 1

Related Questions