Regex issue in R when escaping regex special characters with str_extract

Question

I'm trying to extract the status -- in this case the word "Active" from this pattern:

Status Active Hometown\

Using this regex: https://regex101.com/r/xegX00/1, but I cannot get it to work in R using str_extract. It does seem weird to have dual escapes, but I've tried every possible combination here and cannot get this to work. Any help appreciated!

mutate(status=str_extract(df, "(?<=Status\\n)(.*?)(?=\\)"))

Wiktor Stribiżew · Accepted Answer

Your regex fails because you tested it against a wrong text.

"Status Active Hometown" is a string literal that denotes (defines, represents) the following plain text:

Status
Active
Hometown

In regular expression testers, you need to test against plain text!

To match a newline, you can use " " (i.e. a line feed char, an LF char), or "\n", a regex escape that also matches a line feed char.

You can use

library(stringr)
x <- "Status
Active
Hometown
"
stringr::str_extract(x, "(?<=Status\n).*") ## => [1] "Active"
## or
stringr::str_extract(x, "(?<=Status
).*")  ## => [1] "Active"

See the R demo online and a correct regex test.

Note you do not need an at the end of the pattern, as in an ICU regex flavor (used in R stringr regex methods), the . pattern matches any chars other than line break chars, so it is OK to just use .* to match the whole line.

Regex issue in R when escaping regex special characters with str_extract

Answers (2)

Related Questions