wtrs
wtrs

Reputation: 329

R Ignore character within a Regex string

I've looked all over for some regex that will cause R to disregard the next character within a regular expression string.

For example, given myvector:

 myvector <- c("abcdef", "ghijkl", "mnopqrs")

and a regex string:

 regexstring <- "[a-z]{3}XXXXXXXXX "

which includes some unknown characters XXXXXXXXX, I want to tell R to ignore the final space in the regular expression string itself.

After running the following,

regexstring <- "[a-z]{3} "
sub(regexstring, " ", myvector)

gives,

"abcdef"  "ghijkl"  "mnopqrs"

because there are no spaces in any of the strings. But hopefully after including XXXXXXXXX I will get the same output as if I had run

regexstring <- "[a-z]{3}"
sub(regexstring, " ", myvector)

which is:

 " def"  " jkl"  " pqrs"

I can't erase the final space or use trimws(), etc, and I don't see a way I can make R disregard the final space. Is there any XXXXXXXXX that does this?

Upvotes: 2

Views: 1444

Answers (2)

wtrs
wtrs

Reputation: 329

Building on Wiktor Stribizew's answer, I was able to figure out how to do this with stringr:

require(stringr)
myvector    <- c("abcdef", "ghijkl", "mnopqrs")
regexstring <- regex("[a-z]{3}# ", comments = T)
myvector %>% str_replace(regexstring, " ")

[1] " def"  " jkl"  " pqrs"

This way, I'm able to modify the regex string itself (regexstring) rather than the replacement command (sub or str_replace).

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626845

The final space may be made a formatting space by using a (?x) free-spacing inline modifier in place of XXXs, and pass the perl=TRUE argument to make sure the pattern is parsed with the PCRE regex engine.

myvector <- c("abcdef", "ghijkl", "mnopqrs")
regexstring <- "[a-z]{3}(?x) "
sub(regexstring, " ", myvector, perl=TRUE) 
## => [1] " def"  " jkl"  " pqrs"

See the R demo.

Note that placing (?x) in the middle of the pattern will affect any literal whitespace that is used after (to the right) of the location in the pattern, either until the end of the pattern, or until the (?-x) modifier option.

Upvotes: 2

Related Questions