Chris
Chris

Reputation: 1237

Match strings by row but ignore character order or special characters

I have an output like this:

library(dplyr)  

Data <- tibble(
      Name1 = c("PlaceA, PlaceB & PlaceC", "PlaceD and PlaceE", "PlaceF.", "PlaceG & PlaceH", "Place K-Place L", "Place M and Place N","PlaceP-PlaceQ"),
      Name2 = c("PlaceB, PlaceA & PlaceC", "PlaceD & PlaceE", "PlaceF","PlaceG & PlaceJ", "Place L-Place K", "Place N and Place M","PlaceP-PlaceR")) 
  

I would like to compare the two columns row by row to see if they are the same, but 1) ignore the order of the words 2) the characters used to separate the words and 3) if an '&' has been used instead of 'and'

With an output like this:

Data %>% mutate(Match = c("TRUE","TRUE","TRUE","FALSE","TRUE","TRUE","FALSE"))

I'm sure there must be a way of using stringr to do this, but I can't find it.

Edit @akrun noticing I had made a typo in my dummy data made me think about typos in my real data. If there is only one letter difference (either an additional letter or a mistyped letter in the word) then they are probably the same and should match. If a word has the same letters but in a different order it shouldn't. Something like this:

Mispellings <- tibble(
      Name1 = c("Location","Place","Racecar"),
      Name2 = c("Locatione","Pluce","Carrace"),
      Match = c("TRUE", "TRUE", "FALSE"))

Can any solution for my original question also deal with this additional scenario?

Upvotes: 1

Views: 78

Answers (1)

akrun
akrun

Reputation: 887213

One option is to split into list and sort, then do the comparison of list elements

lst1 <- lapply(strsplit(Data$Name1, "\\s*[,&.-]\\s*|\\s*and\\s*"), sort)
lst2 <- lapply(strsplit(Data$Name2, "\\s*[,&.-]\\s*|\\s*and\\s*"), sort)
mapply(function(x, y) all(x == y), lst1, lst2)
[1]  TRUE  TRUE  TRUE FALSE  TRUE  TRUE FALSE

Or use setequal

do.call(mapply, c(FUN = setequal, unname(lapply(Data, 
    function(x) strsplit(x, "\\s*[,&.-]\\s*|\\s*and\\s*")))))
[1]  TRUE  TRUE  TRUE FALSE  TRUE  TRUE FALSE

Upvotes: 1

Related Questions