staifmis108
staifmis108

Reputation: 67

How to find if a string contain certain characters without considering sequence?

I'm trying to match a name using elements from another vector with R. But I don't know how to escape sequence when using grep() in R.

name <- "Cry River"
string <- c("Yesterday Once More","Are You happy","Cry Me A River")
grep(name, string, value = TRUE)

I expect the output to be "Cry Me A River", but I don't know how to do it.

Upvotes: 2

Views: 72

Answers (4)

Tony Ladson
Tony Ladson

Reputation: 3639

Here's an approach using stringr. Is order important? Is case important? Is it important to match whole words. If you would just like to match 'Cry' and 'River' in any order and don't care about case.

name <- "Cry River"
string <- c("Yesterday Once More",
"Are You happy",
"Cry Me A River", 
"Take me to the River or I'll Cry", 
"The Cryogenic River Rag",
"Crying on the Riverside")

string[str_detect(string, pattern = regex('\\bcry\\b', ignore_case = TRUE)) & 
             str_detect(string, regex('\\bRiver\\b', ignore_case = TRUE))]

Upvotes: 0

akrun
akrun

Reputation: 886938

We can do the grepl on splitted string and Reduce the list of logical vectors to a single logicalvector` and extract the matching element in 'string'

string[Reduce(`&`, lapply(strsplit(name, " ")[[1]], grepl, string))]
#[1] "Cry Me A River"

Also, instead of strsplit, we can insert the .* with sub

grep(sub(" ", ".*", name), string, value = TRUE)
#[1] "Cry Me A River"

Upvotes: 0

Tim Biegeleisen
Tim Biegeleisen

Reputation: 520908

Here is a base R option, using grepl:

name <- "Cry River"
parts <- paste0("\\b", strsplit(name, "\\s+")[[1]], "\\b")
string <- c("Yesterday Once More","Are You happy","Cry Me A River")
result <- sapply(parts, function(x) { grepl(x, string) })
string[rowSums(result) == length(parts)]

[1] "Cry Me A River"

The strategy here is to first split the string containing the various search terms, and generating individual regex patterns for each term. In this case, we generate:

\bCry\b and \bRiver\b

Then, we iterate over each term, and using grepl we check that the term appears in each of the strings. Finally, we retain only those matches which contained all terms.

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388817

Use .* in the pattern

grep("Cry.*River", string, value = TRUE)
#[1] "Cry Me A River"

Or if you are getting names as it is and can't change it, you can split on whitespace and insert the .* between the words like

grep(paste(strsplit(name, "\\s+")[[1]], collapse = ".*"), string, value = TRUE)

where the regex is constructed in the below fashion

strsplit(name, "\\s+")[[1]]
#[1] "Cry"   "River"

paste(strsplit(name, "\\s+")[[1]], collapse = ".*")
#[1] "Cry.*River"

Upvotes: 3

Related Questions