user42485
user42485

Reputation: 811

R match ignore case and special characters

I have searched and found similar answers but not exactly what I need.

I want to identify matches in 2 strings, ignoring case and spaces and special characters.

list1 <- c('a', 'b', 'c')
list2 <- c('A', 'B', 'C')
list3 <- c('a-', 'B_', '- c')

All below should give the same output (1 2 3)

match(list1, list1)
match(list1, list2)
match(list1, list3)

I have tried str_detect(list1, regex(list2, ignore_case = TRUE)) but that doesn't give the same type of output (and I don't know how to incorporate the special characters/spaces in there.

Upvotes: 0

Views: 644

Answers (2)

Allan Cameron
Allan Cameron

Reputation: 173858

You can create a regex that pulls out only the letters in the middle of the strings using gsub, and then convert them to lowercase. You can then use standard match on the result. Best to put all this in its own function:

list1 <- c('a', 'b', 'c')
list2 <- c('A', 'B', 'C')
list3 <- c('a-', 'B_', '- c')

match2 <- function(a, b)
{
  a <- tolower(gsub("(.*)([[:alpha:]]+)(.*)", "\\2", a))
  b <- tolower(gsub("(.*)([[:alpha:]]+)(.*)", "\\2", b))
  match(a, b)
}

match2(list1, list1)
#> [1] 1 2 3
match2(list1, list2)
#> [1] 1 2 3
match2(list1, list3)
#> [1] 1 2 3

Created on 2020-02-21 by the reprex package (v0.3.0)

Upvotes: 2

dario
dario

Reputation: 6483

See that @Allan Cameron posted a very similar solution right before me... going to leave this anyways because different enough.. ?!

list1 <- c('a', 'b', 'c')
list2 <- c('A', 'B', 'C')
list3 <- c('a-', 'B_', '- c')

regex to replace any symbol that is not an alphabetic character with an empty string:

f <- function(x) {
  return(tolower(gsub("[^[:alpha:]]", "", x)))
}

match(f(list1), f(list2))
match(f(list1), f(list3))
match(f(list2), f(list3))

Upvotes: 2

Related Questions