toneloy
toneloy

Reputation: 133

Regular expression contains a string not in a list

In R, how do you check if a string contains a substring that's not in a list? For example, imagine you have the string vector fruits <- c('apple,pear,orange', 'apple,pear', 'apple,banana', 'apple'), and you want a function that tells you if a specific element has a fruit that is not apple or pear. In the example, it would be something like

fruits <- c('apple,pear,orange', 'apple,pear', 
            'apple,banana', 'apple', 'pear,apple')

other_fruits(fruits)
# [1] TRUE  FALSE TRUE  FALSE FALSE

Upvotes: 0

Views: 125

Answers (2)

eipi10
eipi10

Reputation: 93851

If your strings always include fruit names separated by commas, you can do it without a regular expression, as in the example below, though the method below can also be modified to use a regex instead.

fruits <- c('apple,pear,orange', 'apple,pear', 
            'apple,banana', 'apple', 'pear,apple')

sapply(strsplit(fruits,","), function(x) !all(x %in% c("apple","pear")))
[1]  TRUE FALSE  TRUE FALSE FALSE

Or, in general:

other_fruits = function(string, fruit_check) {
  sapply(strsplit(string,","), function(x) !all(x %in% fruit_check))
}

other_fruits(fruits, c("apple","pear"))

Or, say you want to return fruits other than the chose fruits:

other_fruits = function(string, fruit_check) {
  lapply(strsplit(string,","), function(x) {
    if (all(x %in% fruit_check)) NA else x[!(x %in% fruit_check)]
  })
}

other_fruits(fruits, "apple")   
[[1]]
[1] "pear"   "orange"

[[2]]
[1] "pear"

[[3]]
[1] "banana"

[[4]]
[1] NA

[[5]]
[1] "pear"

Upvotes: 2

USER_1
USER_1

Reputation: 2469

You can create an index to see where the fruits appear like this:

fruits <- c('apple,pear,orange', 'apple,pear', 
            'apple,banana', 'apple', 'pear,apple', 'mango')


str <- unique(unlist(strsplit(fruits,",")))
dat <- sapply(str, grepl, fruits)
dat

     apple  pear orange banana mango
[1,]  TRUE  TRUE   TRUE  FALSE FALSE
[2,]  TRUE  TRUE  FALSE  FALSE FALSE
[3,]  TRUE FALSE  FALSE   TRUE FALSE
[4,]  TRUE FALSE  FALSE  FALSE FALSE
[5,]  TRUE  TRUE  FALSE  FALSE FALSE
[6,] FALSE FALSE  FALSE  FALSE  TRUE

Count the number of times a different fruit to apple or pear appears;

apply(dat[,3:ncol(dat)], 1, sum)

Or create a logical vector to indicate where the other fruits are;

as.logical(apply(dat[,3:ncol(dat)], 1, sum))

Upvotes: 0

Related Questions