Reputation: 133
In R, how do you check if a string contains a substring that's not in a list? For example, imagine you have the string vector fruits <- c('apple,pear,orange', 'apple,pear', 'apple,banana', 'apple')
, and you want a function that tells you if a specific element has a fruit that is not apple or pear. In the example, it would be something like
fruits <- c('apple,pear,orange', 'apple,pear',
'apple,banana', 'apple', 'pear,apple')
other_fruits(fruits)
# [1] TRUE FALSE TRUE FALSE FALSE
Upvotes: 0
Views: 125
Reputation: 93851
If your strings always include fruit names separated by commas, you can do it without a regular expression, as in the example below, though the method below can also be modified to use a regex instead.
fruits <- c('apple,pear,orange', 'apple,pear',
'apple,banana', 'apple', 'pear,apple')
sapply(strsplit(fruits,","), function(x) !all(x %in% c("apple","pear")))
[1] TRUE FALSE TRUE FALSE FALSE
Or, in general:
other_fruits = function(string, fruit_check) {
sapply(strsplit(string,","), function(x) !all(x %in% fruit_check))
}
other_fruits(fruits, c("apple","pear"))
Or, say you want to return fruits other than the chose fruits:
other_fruits = function(string, fruit_check) {
lapply(strsplit(string,","), function(x) {
if (all(x %in% fruit_check)) NA else x[!(x %in% fruit_check)]
})
}
other_fruits(fruits, "apple")
[[1]] [1] "pear" "orange" [[2]] [1] "pear" [[3]] [1] "banana" [[4]] [1] NA [[5]] [1] "pear"
Upvotes: 2
Reputation: 2469
You can create an index to see where the fruits appear like this:
fruits <- c('apple,pear,orange', 'apple,pear',
'apple,banana', 'apple', 'pear,apple', 'mango')
str <- unique(unlist(strsplit(fruits,",")))
dat <- sapply(str, grepl, fruits)
dat
apple pear orange banana mango
[1,] TRUE TRUE TRUE FALSE FALSE
[2,] TRUE TRUE FALSE FALSE FALSE
[3,] TRUE FALSE FALSE TRUE FALSE
[4,] TRUE FALSE FALSE FALSE FALSE
[5,] TRUE TRUE FALSE FALSE FALSE
[6,] FALSE FALSE FALSE FALSE TRUE
Count the number of times a different fruit to apple or pear appears;
apply(dat[,3:ncol(dat)], 1, sum)
Or create a logical vector to indicate where the other fruits are;
as.logical(apply(dat[,3:ncol(dat)], 1, sum))
Upvotes: 0