Reputation: 389
I have the following data frame:
> my.data
A.Seats B.Seats
1 14,15 14,15,16
2 7 7,8
3 12,13 16,17
4 <NA> 10,11
I would like to check if the string within any row in column "A.Seats" is found within the same row of column "B.Seats". So the output would look something like this:
A.Seats B.Seats Check
1 14,15 14,15,16 TRUE
2 7 7,8 TRUE
3 12,13 16,17 FALSE
4 <NA> 10,11 FALSE
But I don't know how to create this table. As a start, I tried using grep:
grep(my.data$A.Seats,my.data$B.Seats)
But I receive the following output
[1] 1
Warning message:
In grep(my.data$A.Seats, my.data$B.Seats) :
argument 'pattern' has length > 1 and only the first element will be used
...and I can't get past this error. Any ideas as to how I can get the intended result?
Many Thanks
Upvotes: 3
Views: 10429
Reputation: 193517
The "stringi" library has some vectorized functions that might be useful for something like this. I would suggest the stri_detect()
function. Here's an example with some reproducible sample data. Note the difference in the values in the first and last row, and the difference in the results according to whether a regex
or fixed
approach was taken:
my.data <- data.frame(
A.Seats = c("14,15", "7", "12,13", NA, "14,19"),
B.Seats = c("14,15,16", "7,8", "16,17", "10,11", "14,15,16"))
my.data
# A.Seats B.Seats
# 1 14,15 14,15,16
# 2 7 7,8
# 3 12,13 16,17
# 4 <NA> 10,11
# 5 14,19 14,15,16
library(stringi)
stri_detect(my.data$B.Seats, fixed = my.data$A.Seats)
# [1] TRUE TRUE FALSE NA FALSE
stri_detect(my.data$B.Seats, regex = gsub(",", "|", my.data$A.Seats))
# [1] TRUE TRUE FALSE NA TRUE
The first option above treats the values in my.data$A.Seats
as a fixed string pattern. The second option treats it as a regular expression to match any of the values.
Note that this maintains NA
as NA
, but that can easily be changed to FALSE
if you need to.
If you don't want to think too much about mapply
, you can consider Vectorize
to make a vectorized version of grepl
. Something like the following should do it:
vGrepl <- Vectorize(grepl)
vGrepl(my.data$A.Seats, my.data$B.Seats) # pattern is fixed
# [1] 1 1 0 NA 0
vGrepl(gsub(",", "|", my.data$A.Seats), my.data$B.Seats) # pattern is regex
# 14|15 7 12|13 <NA> 14|19
# 1 1 0 NA 1
as.logical(vGrepl(my.data$A.Seats, my.data$B.Seats)) # coerce to logical
# [1] TRUE TRUE FALSE NA FALSE
Because this calls grepl
on each element in the vector, I don't think this will scale well though.
Upvotes: 2
Reputation: 61154
This is an approach to get what you need
> List <- lapply(my.data, function(x) strsplit(as.character(x), ","))
> transform(my.data, Check=sapply(mapply("%in%", List[[1]], List[[2]]), any))
A.Seats B.Seats Check
1 14,15 14,15,16 TRUE
2 7 7,8 TRUE
3 12,13 16,17 FALSE
4 <NA> 10,11 FALSE
Here's an alternative using grep
>transform(my.data,
Check=sapply(suppressWarnings(mapply("grep", List[[1]], List[[2]])), any))
Upvotes: 1