Reputation: 377
I have two strings that are the same length. I want to compare the strings element-wise and return a TRUE or FALSE for each index. For example:
string1 <- "abcd1234"
string2 <- "abcd1434"
result <- [T,T,T,T,T,F,T,T]
So far I have the strings and I have created character vectors by unlisting them but I haven't been able to get any of the string functions in R to work so far. I know I could use a for loop and do a simple == but I was wondering if there was some sort of vectorized way of doing this.
str1 <- unlist(str_split(string1, ""))
str2 <- unlist(str_split(string2, ""))
There are also cases where one of the strings will have a _ indicating that this character is essentially a wildcard and it doesnt need to be checked for equality. This is why I was trying to get one of the regex things to work in R so I could replace the _ with a wildcard.
string1 <- "abcd_234"
string2 <- "abcd1224"
result <- [T,T,T,T,T,T,F,T]
Upvotes: 3
Views: 3509
Reputation: 509
I know this has already been answered a long time ago but I thought I'd submit a handy copy-paste version for all those R beginners. So here is @d.b 's answer in an more beginner friendly way:
f.check.string.equality <- function(s1, s2) {
isEqual = TRUE;
resEqualCheck = apply(do.call(rbind, strsplit(c(s1, s2), "")), 2, function(x) {
length(unique(x[!x %in% "_"])) == 1 }
)
for (val in resEqualCheck) {
if (val == FALSE) {
isEqual = val
}
}
return(list(isEqual=isEqual, charsResult=resEqualCheck))
}
Then you simply call the function with the input strings you want to compare as follows:
strComp1 = f.check.string.equality("TestStr", "teststr")
strComp2 = f.check.string.equality(tolower("TestStr"), "teststr")
... the results of which are as follows:
strComp1$isEqual
> strComp1$isEqual
[1] FALSE
strComp1$charsResult
> strComp1$charsResult
[1] FALSE TRUE TRUE TRUE FALSE TRUE TRUE
strComp2$isEqual
> strComp2$isEqual
[1] TRUE
strComp2$charsResult
> strComp2$charsResult
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE
... and now you're finally a happy camper. :)
Upvotes: 0
Reputation: 32548
apply(do.call(rbind, strsplit(c(string1, string2), "")), 2, function(x){
length(unique(x[!x %in% "_"])) == 1
})
#[1] TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE
You could also slightly modify Rich's deleted answer
Reduce(f = function(s1, s2){
s1 == s2 | s1 == "_" | s2 == "_"
},
x = strsplit(c(string1, string2), ""))
#[1] TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE
Note that the first approach will allow comparison of more than two strings
Upvotes: 4
Reputation: 24079
Here is brute force method. I am the str_locate_all to find all of the "_" in both string and setting those values to True, to take into consideration the wild card nature of the problem.
library(stringr)
string1 <- "abcd_234"
string2 <- "abcd1224"
str1 <- str_split(string1, "")[[1]]
str2 <- str_split(string2, "")[[1]]
#compare characters one by one
result<- str1==str2
#Correct for wildcards in both strings
result[str_locate_all(string1, "_")[[1]][,1]]<-TRUE
result[str_locate_all(string2, "_")[[1]][,1]]<-TRUE
result
#[1] TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE
Upvotes: 2