milan
milan

Reputation: 4970

Extract pattern from string in R without distinguishing between upper and lower case letters

This is a toy example. I want to search within a and extract those colors that are listed in b. Even if the color does not start with an upper case letter, I want to extract it. However, the output should tell me how the color was used in a.

So the answer I would like to get is #"Red" NA "blue.

a <- "She has Red hair and blue eyes"
b <- c("Red", "Yellow", "Blue")
str_extract(a, b)#"Red" NA    NA

I used str_extract from 'stringr', but would be happy to use another function/package (e.g., grep).

Upvotes: 5

Views: 5174

Answers (5)

Matt L.
Matt L.

Reputation: 2964

The ignore.case option provided in @leerssej answer is now deprecated (as noted in comments), but is no longer supported at all. The stringr syntax supported now is:

str_extract(a, regex(b, ignore_case = T)) #"Red" NA    "blue"

Upvotes: 4

akrun
akrun

Reputation: 887118

We can do this base R

unlist(sapply(tolower(b), function(x) {
        x1 <- regmatches(a, gregexpr(x, tolower(a)))
      replace(x1, x1 == "character(0)", NA)}), use.names=FALSE)
# "Red"     NA "blue" 

Or as inspired from @leerssej's answer

library(stringr)
str_extract(a, fixed(b, ignore_case=TRUE))
#[1] "Red"  NA     "blue"

Upvotes: 5

leerssej
leerssej

Reputation: 14958

stringr has an ignore.case() function

str_extract(a, ignore.case(b))#"Red"  NA     "blue"

Upvotes: 5

Jota
Jota

Reputation: 17611

With stringi one can use the case-insensitive option

library(stringi)
stri_extract_all_fixed(a, b, opts_fixed = list(case_insensitive = TRUE))
#[[1]]
#[1] "Red"
#[[2]]
#[1] NA
#[[3]]
#[1] "blue"


# or using simplify = TRUE to get a non-list output
stri_extract_all_fixed(a, b, opts_fixed = list(case_insensitive = TRUE), 
    simplify = TRUE)
#     [,1]  
#[1,] "Red" 
#[2,] NA    
#[3,] "blue"

Upvotes: 5

Dominic Comtois
Dominic Comtois

Reputation: 10401

As a refinement to akrun's answer, you can use the change of case for matching, but still return elements the way they are originally written in a:

library(stringr)
a <- "She has Red hair and blue eyes"
b <- c("Red", "Yellow", "Blue")

positions <- str_locate(toupper(a), toupper(b))
apply(positions, 1, function(x) substr(a,x[1],x[2]))

## [1] "Red"  NA  "blue"

Or, to eliminate the NA...

positions <- str_locate(toupper(a), toupper(b))
words <- apply(positions, 1, function(x) substr(a,x[1],x[2]))
words[!is.na(words)]

## [1] "Red"  "blue"

Upvotes: 2

Related Questions