Reputation: 4970
This is a toy example. I want to search within a
and extract those colors that are listed in b
. Even if the color does not start with an upper case letter, I want to extract it. However, the output should tell me how the color was used in a
.
So the answer I would like to get is #"Red" NA "blue
.
a <- "She has Red hair and blue eyes"
b <- c("Red", "Yellow", "Blue")
str_extract(a, b)#"Red" NA NA
I used str_extract
from 'stringr', but would be happy to use another function/package (e.g., grep
).
Upvotes: 5
Views: 5174
Reputation: 2964
The ignore.case
option provided in @leerssej answer is now deprecated (as noted in comments), but is no longer supported at all.
The stringr
syntax supported now is:
str_extract(a, regex(b, ignore_case = T)) #"Red" NA "blue"
Upvotes: 4
Reputation: 887118
We can do this base R
unlist(sapply(tolower(b), function(x) {
x1 <- regmatches(a, gregexpr(x, tolower(a)))
replace(x1, x1 == "character(0)", NA)}), use.names=FALSE)
# "Red" NA "blue"
Or as inspired from @leerssej's answer
library(stringr)
str_extract(a, fixed(b, ignore_case=TRUE))
#[1] "Red" NA "blue"
Upvotes: 5
Reputation: 14958
stringr has an ignore.case() function
str_extract(a, ignore.case(b))#"Red" NA "blue"
Upvotes: 5
Reputation: 17611
With stringi
one can use the case-insensitive option
library(stringi)
stri_extract_all_fixed(a, b, opts_fixed = list(case_insensitive = TRUE))
#[[1]]
#[1] "Red"
#[[2]]
#[1] NA
#[[3]]
#[1] "blue"
# or using simplify = TRUE to get a non-list output
stri_extract_all_fixed(a, b, opts_fixed = list(case_insensitive = TRUE),
simplify = TRUE)
# [,1]
#[1,] "Red"
#[2,] NA
#[3,] "blue"
Upvotes: 5
Reputation: 10401
As a refinement to akrun's answer, you can use the change of case for matching, but still return elements the way they are originally written in a
:
library(stringr)
a <- "She has Red hair and blue eyes"
b <- c("Red", "Yellow", "Blue")
positions <- str_locate(toupper(a), toupper(b))
apply(positions, 1, function(x) substr(a,x[1],x[2]))
## [1] "Red" NA "blue"
Or, to eliminate the NA...
positions <- str_locate(toupper(a), toupper(b))
words <- apply(positions, 1, function(x) substr(a,x[1],x[2]))
words[!is.na(words)]
## [1] "Red" "blue"
Upvotes: 2