Reputation: 5033
I have a function:
ncount <- function(num = NULL) {
toRead <- readLines("abc.txt")
n <- as.character(num)
x <- grep("{"n"} number",toRead,value=TRUE)
}
While grep-ing, I want the num passed in the function to dynamically create the pattern to be searched? How can this be done in R? The text file has number and text in every line
Upvotes: 1
Views: 6987
Reputation: 626747
In order to build a regular expression from variables in R, in the current scenarion, you may simply concatenate string literals with your variable using paste0
:
grep(paste0('\\{', n, '} number'), homicides, value=TRUE)
Note that {
is a special character outside a [...]
bracket expression (also called character class), and should be escaped if you need to find a literal {
char.
In case you use a list of items as an alternative list, you may use a combination of paste
/paste0
:
words <- c('bananas', 'mangoes', 'plums')
regex <- paste0('Ben likes (', paste(words, collapse='|'), ')\\.')
The resulting Ben likes (bananas|mangoes|plums)\.
regex will match Ben likes bananas.
, Ben likes mangoes.
or Ben likes plums.
. See the R demo and the regex demo.
NOTE: PCRE (when you pass perl=TRUE
to base R regex functions) or ICU (stringr/stringi regex functions) have proved to better handle these scenarios, it is recommended to use those engines rather than the default TRE regex library used in base R regex functions.
Oftentimes, you will want to build a pattern with a list of words that should be matched exactly, as whole words. Here, a lot will depend on the type of boundaries and whether the words can contain special regex metacharacters or not, whether they can contain whitespace or not.
In the most general case, word boundaries (\b
) work well.
regex <- paste0('\\b(', paste(words, collapse='|'), ')\\b')
unlist(regmatches(examples, gregexpr(regex, examples, perl=TRUE)))
## => [1] "bananas" "mangoes" "plums"
The \b(bananas|mangoes|plums)\b
pattern will match bananas
, but won't match banana
(see an R demo).
If your list is like
words <- c('cm+km', 'uname\\vname')
you will have to escape the words first, i.e. append \
before each of the metacharacter:
regex.escape <- function(string) {
gsub("([][{}()+*^$|\\\\?.])", "\\\\\\1", string)
}
examples <- c('Text: cm+km, and some uname\\vname?')
words <- c('cm+km', 'uname\\vname')
regex <- paste0('\\b(', paste(regex.escape(words), collapse='|'), ')\\b')
cat( unlist(regmatches(examples, gregexpr(regex, examples, perl=TRUE))) )
## => cm+km uname\vname
If your words can start or end with a special regex metacharacter, \b
word boundaries won't work. Use
(?<!\w)
/ (?!\w)
, when the match is expected between non-word chars or start/end of string(?<!\S)
/ (?!\S)
, when the match is expected to be enclosed with whitespace chars, or start/end of stringExample of the first two approaches in R (replacing with the match enclosed with <<
and >>
):
regex.escape <- function(string) {
gsub("([][{}()+*^$|\\\\?.])", "\\\\\\1", string)
}
examples <- 'Text: cm+km, +km and C++,Delphi,C++CLI and C++/CLI.'
words <- c('+km', 'C++')
# Unambiguous word boundaries
regex <- paste0('(?<!\\w)(', paste(regex.escape(words), collapse='|'), ')(?!\\w)')
gsub(regex, "<<\\1>>", examples, perl=TRUE)
# => [1] "Text: cm+km, <<+km>> and <<C++>>,Delphi,C++CLI and <<C++>>/CLI."
# Whitespace boundaries
regex <- paste0('(?<!\\S)(', paste(regex.escape(words), collapse='|'), ')(?!\\S)')
gsub(regex, "<<\\1>>", examples, perl=TRUE)
# => [1] "Text: cm+km, <<+km>> and C++,Delphi,C++CLI and C++/CLI."
Upvotes: 0
Reputation: 81683
You could use paste
to concatenate strings:
grep(paste("{", n, "} number", sep = ""),homicides,value=TRUE)
Upvotes: 5