mikew
mikew

Reputation: 377

How to compare characters of two string at each index?

I have two strings that are the same length. I want to compare the strings element-wise and return a TRUE or FALSE for each index. For example:

string1 <- "abcd1234"
string2 <- "abcd1434"
result <- [T,T,T,T,T,F,T,T]

So far I have the strings and I have created character vectors by unlisting them but I haven't been able to get any of the string functions in R to work so far. I know I could use a for loop and do a simple == but I was wondering if there was some sort of vectorized way of doing this.

str1 <- unlist(str_split(string1, "")) 
str2 <- unlist(str_split(string2, "")) 

There are also cases where one of the strings will have a _ indicating that this character is essentially a wildcard and it doesnt need to be checked for equality. This is why I was trying to get one of the regex things to work in R so I could replace the _ with a wildcard.

string1 <- "abcd_234"
string2 <- "abcd1224"
result <- [T,T,T,T,T,T,F,T] 

Upvotes: 3

Views: 3509

Answers (3)

SilSur
SilSur

Reputation: 509

I know this has already been answered a long time ago but I thought I'd submit a handy copy-paste version for all those R beginners. So here is @d.b 's answer in an more beginner friendly way:

f.check.string.equality <- function(s1, s2) {
  isEqual = TRUE;
  resEqualCheck = apply(do.call(rbind, strsplit(c(s1, s2), "")), 2, function(x) {
    length(unique(x[!x %in% "_"])) == 1 }
  )
  for (val in resEqualCheck) {
    if (val == FALSE) {
      isEqual = val
    }
  }
  return(list(isEqual=isEqual, charsResult=resEqualCheck))
}

Then you simply call the function with the input strings you want to compare as follows:

strComp1 = f.check.string.equality("TestStr", "teststr")
strComp2 = f.check.string.equality(tolower("TestStr"), "teststr")

... the results of which are as follows:

strComp1$isEqual   
> strComp1$isEqual
[1] FALSE 


strComp1$charsResult
> strComp1$charsResult
[1] FALSE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE

strComp2$isEqual   
> strComp2$isEqual
[1] TRUE 


strComp2$charsResult
> strComp2$charsResult
[1] TRUE  TRUE  TRUE  TRUE TRUE  TRUE  TRUE

... and now you're finally a happy camper. :)

Upvotes: 0

d.b
d.b

Reputation: 32548

apply(do.call(rbind, strsplit(c(string1, string2), "")), 2, function(x){
    length(unique(x[!x %in% "_"])) == 1
})
#[1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE

You could also slightly modify Rich's deleted answer

Reduce(f = function(s1, s2){
    s1 == s2 | s1 == "_" | s2 == "_"
},
x = strsplit(c(string1, string2), ""))
#[1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE

Note that the first approach will allow comparison of more than two strings

Upvotes: 4

Dave2e
Dave2e

Reputation: 24079

Here is brute force method. I am the str_locate_all to find all of the "_" in both string and setting those values to True, to take into consideration the wild card nature of the problem.

library(stringr)
string1 <- "abcd_234"
string2 <- "abcd1224"

str1 <- str_split(string1, "")[[1]]
str2 <- str_split(string2, "")[[1]]

#compare characters one by one
result<- str1==str2

#Correct for wildcards in both strings
result[str_locate_all(string1, "_")[[1]][,1]]<-TRUE
result[str_locate_all(string2, "_")[[1]][,1]]<-TRUE

result
#[1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE

Upvotes: 2

Related Questions