tall_table
tall_table

Reputation: 311

breaking up a long regular expression in R

Problem: I am using R and stringr and I have a very long regular expression using the "or" operator that I save to an object and use with stringr. How can I break it up into multiple lines in R so I do not have to keep scrolling to the right in my source editor? When I try commas, only the first line is recognized. Most answers to this question have been for other programming languages (i.e. not R).

regex_of_sites <- "side|southeast|north|computer|engineer|first|south|pharm|left|southwest|level|second|thirteenth"

Upvotes: 3

Views: 662

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627190

Since you are using the pattern with stringr methods that use ICU regex flavor, you may use a (?x) free spacing (also called verbose, or ignore pattern whitespace) modifier where all unescaped whitespace is ignored when compiling the pattern, and there is a possibility to add comments after an unescaped # on each line (so, all literal # must be escaped).

Here is an example:

> library(stringr)
> regex_of_sites <- "(?x)side     # Term 0
+ |southeast                      # Term 1
+ |north                          # Term 1
+ |computer                       # Term 2
+ |engineer
+ |first
+ |south
+ |pharm
+ |left
+ |southwest
+ |level
+ |second
+ |thirteenth"
> str_extract_all("first level", regex_of_sites)
[[1]]
[1] "first" "level"

The same modifier is supported by the PCRE patterns used in base R regex functions with perl=TRUE.

Upvotes: 6

MrFlick
MrFlick

Reputation: 206466

The regular expression is just a string. You can paste it together across multiple lines like any other string

regex_of_sites <- paste0("side|southeast|north|computer|engineer|",
     "first|south|pharm|left|southwest|",
     "level|second|thirteenth")

Upvotes: 4

Related Questions