Reputation: 21
I am trying to evaluate if the value of one (string) variable matches those in multiple other (string) variables in an R dataframe. If there is at least one valid match, I would like to return True; if not, I would like to return False.
Consider this toy dataframe:
toydf<-data.frame(
base1=c("DOG","CAT","MOUSE"),
base2=c("FISH","RAT","BUNNY"),
target=c("DOG","HORSE","BUNNY"),
stringsAsFactors=FALSE)
base1 base2 target
1 DOG FISH DOG
2 CAT RAT HORSE
3 MOUSE BUNNY BUNNY
I want to compare the values in target with those in both base1 and base2 and return TRUE if there is at least one match, and FALSE otherwise:
base1 base2 target check
1 DOG FISH DOG TRUE
2 CAT RAT HORSE FALSE
3 MOUSE BUNNY BUNNY TRUE
In this simple and small example, I know this can be easily achieved using:
toydf$check<-toydf$target==toydf$base1 | toydf$target==toydf$base2
However, in the actual dataset, I have a very large number of base variables against which to check for matches, so I'd like to avoid repeating these | statements.
I've attempted to achieve this using %in% but in order to do that, I first have to collect the values of base1 and base2 in a list or vector:
toydf$baseall<-apply(toydf[1:2],1,function(x) list(x))
toydf$check<-toydf$target %in% toydf$baseall
However, this returns a vector with all values to FALSE. I suspect this has something to do with the way the list is created in the dataframe, but I am not sure how to solve this.
Any help would be appreciated. Thank you.
Upvotes: 2
Views: 386
Reputation: 23818
Here's another possibility:
toydf$check <- as.logical(rowSums(toydf==toydf$target)-1)
#> toydf
# base1 base2 target check
#1 DOG FISH DOG TRUE
#2 CAT RAT HORSE FALSE
#3 MOUSE BUNNY BUNNY TRUE
This code counts for each row of the dataframe the cases where an entry is equal to that specified in the column toydf$target
. Since we did not exclude this target column from the dataframe, the sum is always at least one (the entry in the target column is obviously equal to itself), hence we need to correct this by subtracting 1. The result for each row is then converted into a Boolean FALSE
or TRUE
depending on whether the calculated value is zero (no entry in the other columns is equal to that in the target
column) or not, respectively.
Hope this helps.
Upvotes: 2
Reputation: 898
# how about:
bool <- apply(toydf[,1:2], 2, FUN = "%in%", toydf$target)
toydf$check <- apply(bool, 1, any)
Upvotes: 0