Reputation: 67
I'm trying to match a name using elements from another vector with R. But I don't know how to escape sequence when using grep()
in R.
name <- "Cry River"
string <- c("Yesterday Once More","Are You happy","Cry Me A River")
grep(name, string, value = TRUE)
I expect the output to be "Cry Me A River", but I don't know how to do it.
Upvotes: 2
Views: 72
Reputation: 3639
Here's an approach using stringr. Is order important? Is case important? Is it important to match whole words. If you would just like to match 'Cry' and 'River' in any order and don't care about case.
name <- "Cry River"
string <- c("Yesterday Once More",
"Are You happy",
"Cry Me A River",
"Take me to the River or I'll Cry",
"The Cryogenic River Rag",
"Crying on the Riverside")
string[str_detect(string, pattern = regex('\\bcry\\b', ignore_case = TRUE)) &
str_detect(string, regex('\\bRiver\\b', ignore_case = TRUE))]
Upvotes: 0
Reputation: 886938
We can do the grepl
on splitted string and Reduce
the list
of logical vector
s to a single logical
vector` and extract the matching element in 'string'
string[Reduce(`&`, lapply(strsplit(name, " ")[[1]], grepl, string))]
#[1] "Cry Me A River"
Also, instead of strsplit
, we can insert the .*
with sub
grep(sub(" ", ".*", name), string, value = TRUE)
#[1] "Cry Me A River"
Upvotes: 0
Reputation: 520908
Here is a base R option, using grepl
:
name <- "Cry River"
parts <- paste0("\\b", strsplit(name, "\\s+")[[1]], "\\b")
string <- c("Yesterday Once More","Are You happy","Cry Me A River")
result <- sapply(parts, function(x) { grepl(x, string) })
string[rowSums(result) == length(parts)]
[1] "Cry Me A River"
The strategy here is to first split the string containing the various search terms, and generating individual regex patterns for each term. In this case, we generate:
\bCry\b and \bRiver\b
Then, we iterate over each term, and using grepl
we check that the term appears in each of the strings. Finally, we retain only those matches which contained all terms.
Upvotes: 1
Reputation: 388817
Use .*
in the pattern
grep("Cry.*River", string, value = TRUE)
#[1] "Cry Me A River"
Or if you are getting names
as it is and can't change it, you can split on whitespace and insert the .*
between the words like
grep(paste(strsplit(name, "\\s+")[[1]], collapse = ".*"), string, value = TRUE)
where the regex is constructed in the below fashion
strsplit(name, "\\s+")[[1]]
#[1] "Cry" "River"
paste(strsplit(name, "\\s+")[[1]], collapse = ".*")
#[1] "Cry.*River"
Upvotes: 3