Reputation: 109
Thanks for grep using a character vector with multiple patterns, I figured out my own problem as well. The question here was how to find multiple values by using grep function, and the solution was either these:
grep("A1| A9 | A6")
or
toMatch <- c("A1", "A9", "A6")
matches <- unique (grep(paste(toMatch,collapse="|")
So I used the second suggestion since I had MANY values to search for.
But I'm curious why c() or for loop doesn't work out instead of |. Before I researched the possible solution in stackoverflow and found recommendations above, I tried out two alternatives that I'll demonstrate below:
First, what I've written in R was something like this:
find.explore.l<-lapply(text.words.bl ,function(m) grep("^explor",m))
But then I had to 'grep' many words, so I tried out this
find.explore.l<-lapply(text.words.bl ,function(m) grep(c("A1","A2","A3"),m))
It didn't work, so I tried another one(XXX is the list of words that I'm supposed to find in the text)
for (i in XXX){
find.explore.l<-lapply(text.words.bl ,function(m) grep("XXX[i]"),m))
.......(more lines to append lines etc)
}
and it seemed like R tried to match XXX[i] itself, not the words inside. Why can't c() and for loop for grep return right results? Someone please let me know! I'm so curious :P
Upvotes: 3
Views: 1796
Reputation: 5586
From the documentation for the pattern=
argument in the grep()
function:
Character string containing a regular expression (or character string for
fixed = TRUE
) to be matched in the given character vector. Coerced byas.character
to a character string if possible. If a character vector of length 2 or more is supplied, the first element is used with a warning. Missing values are allowed except forregexpr
andgregexpr
.
This confirms that, as @nrussell said in a comment, grep()
is not vectorized over the pattern argument. Because of this, c()
won't work for a list of regular expressions.
You could, however, use a loop, you just have to modify your syntax.
toMatch <- c("A1", "A9", "A6")
# Loop over values to match
for (i in toMatch) {
grep(i, text)
}
Using "XXX[i]"
as your pattern doesn't work because it's interpreting that as a regular expression. That is, it will match exactly XXXi
. To reference an element of a vector of regular expressions, you would simply use XXX[i]
(note the lack of surrounding quotes).
You can apply()
this, but in a slightly different way than you had done. You apply it to each regex in the list, rather than each text string.
lapply(toMatch, function(rgx, text) grep(rgx, text), text = text)
However, the best approach would be, as you already have in your post, to use
matches <- unique(grep(paste(toMatch, collapse = "|"), text))
Upvotes: 1
Reputation: 28441
Consider that:
XXX <- c("a", "b", "XXX[i]")
grep("XXX[i]", XXX, value=T)
character(0)
grep("XXX\\[i\\]", XXX, value=T)
[1] "XXX[i]"
What is R doing? It is using special rules for the first argument of grep
. The brackets are considered special characters ([
and ]
). I put in two backslashes to tell R to consider them regular brackets. And imgaine what would happen if I put that last expression into a for
loop? It wouldn't do what I expected.
If you would like a for
loop that goes through a character vector of possible matches, take out the quotes in the grep
function.
#if you want the match returned
matches <- c("a", "b")
for (i in matches) print(grep(i, XXX, value=T))
[1] "a"
[1] "b"
#if you want the vector location of the match
for (i in matches) print(grep(i, XXX))
[1] 1
[1] 2
As the comments point out, grep(c("A1","A2","A3"),m))
is violating the grep
required syntax.
Upvotes: 0