Reputation: 372
I used grepl to check whether a string contains any of the patterns from a set of patterns (I used '|' to separate the patterns). Reverse search didn't help. How to identify the set of patterns that match?
Additional information: This can be solved by writing a loop, but it is very time consuming as my set has > 100,000 strings. Can it be optimized?
Eg: Let the string be a <- "Hello"
pattern <- c("ll", "lo", "hl")
pattern1 <- paste(pattern, collapse="|") # "ll|lo|hl"
grepl(a, pattern=pattern1) # returns TRUE
grepl(pattern, pattern=a) # returns FALSE 'n' times - n is 3 here
Upvotes: 4
Views: 3197
Reputation: 32456
You can also use base R with a lookahead expression, (?=)
, since the patterns overlap. With gregexpr
you can extract the match location for each grouped pattern as a matrix.
## changed your string so the second pattern matches twice
a <- "Hellolo"
pattern <- c("ll", "lo", "hl")
pattern1 <- sprintf("(?=(%s))", paste(pattern, collapse=")|(")) # "(?=(ll)|(lo)|(hl))"
attr(gregexpr(pattern1, a, perl=T)[[1]], "capture.start")
# [1,] 3 0 0
# [2,] 0 4 0
# [3,] 0 6 0
Each column of the matrix corresponds to the patterns, so pattern 2 matched positions 4 and 6 in the test string, pattern 1 matched at position 3, and so on.
Upvotes: 1
Reputation: 31181
You are looking for str_detect
from package stringr
:
library(stringr)
str_detect(a, pattern)
#[1] TRUE TRUE FALSE
In case you have multiple strings like a = c('hello','hola','plouf')
you can do:
lapply(a, function(u) pattern[str_detect(u, pattern)])
Upvotes: 8