Reputation: 170
I'm trying to extract the status -- in this case the word "Active" from this pattern:
Status\nActive\nHometown\
Using this regex: https://regex101.com/r/xegX00/1, but I cannot get it to work in R using str_extract. It does seem weird to have dual escapes, but I've tried every possible combination here and cannot get this to work. Any help appreciated!
mutate(status=str_extract(df, "(?<=Status\\\\n)(.*?)(?=\\\\)"))
Upvotes: 2
Views: 133
Reputation: 626926
Your regex fails because you tested it against a wrong text.
"Status\nActive\nHometown"
is a string literal that denotes (defines, represents) the following plain text:
Status
Active
Hometown
In regular expression testers, you need to test against plain text!
To match a newline, you can use "\n"
(i.e. a line feed char, an LF char), or "\\n"
, a regex escape that also matches a line feed char.
You can use
library(stringr)
x <- "Status\nActive\nHometown\n"
stringr::str_extract(x, "(?<=Status\\n).*") ## => [1] "Active"
## or
stringr::str_extract(x, "(?<=Status\n).*") ## => [1] "Active"
See the R demo online and a correct regex test.
Note you do not need an \n
at the end of the pattern, as in an ICU regex flavor (used in R stringr
regex methods), the .
pattern matches any chars other than line break chars, so it is OK to just use .*
to match the whole line.
Upvotes: 0
Reputation: 389047
You can use sub
in base R -
x <- "Status\nActive\nHometown\n"
sub('.*Status\n(.*?)\n.*', '\\1', x)
#[1] "Active"
If you want to use stringr
, here is a suggestion with str_match
which avoids using lookahead regex
stringr::str_match(x, 'Status\n(.*)\n')[, 2]
#[1] "Active"
Upvotes: 3