Reputation: 811
I have searched and found similar answers but not exactly what I need.
I want to identify matches in 2 strings, ignoring case and spaces and special characters.
list1 <- c('a', 'b', 'c')
list2 <- c('A', 'B', 'C')
list3 <- c('a-', 'B_', '- c')
All below should give the same output (1 2 3)
match(list1, list1)
match(list1, list2)
match(list1, list3)
I have tried str_detect(list1, regex(list2, ignore_case = TRUE))
but that doesn't give the same type of output (and I don't know how to incorporate the special characters/spaces in there.
Upvotes: 0
Views: 644
Reputation: 173858
You can create a regex that pulls out only the letters in the middle of the strings using gsub
, and then convert them to lowercase. You can then use standard match
on the result. Best to put all this in its own function:
list1 <- c('a', 'b', 'c')
list2 <- c('A', 'B', 'C')
list3 <- c('a-', 'B_', '- c')
match2 <- function(a, b)
{
a <- tolower(gsub("(.*)([[:alpha:]]+)(.*)", "\\2", a))
b <- tolower(gsub("(.*)([[:alpha:]]+)(.*)", "\\2", b))
match(a, b)
}
match2(list1, list1)
#> [1] 1 2 3
match2(list1, list2)
#> [1] 1 2 3
match2(list1, list3)
#> [1] 1 2 3
Created on 2020-02-21 by the reprex package (v0.3.0)
Upvotes: 2
Reputation: 6483
See that @Allan Cameron posted a very similar solution right before me... going to leave this anyways because different enough.. ?!
list1 <- c('a', 'b', 'c')
list2 <- c('A', 'B', 'C')
list3 <- c('a-', 'B_', '- c')
regex to replace any symbol that is not an alphabetic character with an empty string:
f <- function(x) {
return(tolower(gsub("[^[:alpha:]]", "", x)))
}
match(f(list1), f(list2))
match(f(list1), f(list3))
match(f(list2), f(list3))
Upvotes: 2