Reputation: 3764
With two vectors
x <- c("abc", "12")
y <- c("bc", "123", "nomatch")
is there a way to do a filter of both by 'two-way' partial matching (remove elements in one vector if they contain or are contained in any element in the other vector) so that the result are these two vectors:
x1 <- c()
y1 <- c("nomatch")
To explain - every element of x is either a substring or a superstring of one of the elements of y, hence x1 is empty. Update - it is not sufficient for a substring to match the initial chars - a substring might be found anywhere in the string it matches. Example above has been updated to reflect this.
Upvotes: 4
Views: 435
Reputation: 93813
I originally thought ?pmatch
might be handy, but your edit clarifies you don't just want to match the start of items. Here's a function that should work:
remover <- function(x,y) {
pmx <- sapply(x, grep, x=y)
pmy <- sapply(y, grep, x=x)
hit <- unlist(c(pmx,pmy))
list(
x[!(seq_along(x) %in% hit)],
y[!(seq_along(y) %in% hit)]
)
}
remover(x,y)
#[[1]]
#character(0)
#
#[[2]]
#[1] "nomatch"
It correctly does nothing when no match is found (thanks @Frank for picking up the earlier error):
remover("yo","nomatch")
#[[1]]
#[1] "yo"
#
#[[2]]
#[1] "nomatch"
Upvotes: 3
Reputation: 47320
I build a matrix of all possible matches in both directions, then combine both with |
as a match in any direction is equally a match, and then and use it to subset x
and y
:
x <- c("abc", "12")
y <- c("bc", "123", "nomatch")
bool_mat <- sapply(x,function(z) grepl(z,y)) | t(sapply(y,function(z) grepl(z,x)))
x1 <- x[!apply(bool_mat,2,any)] # character(0)
y1 <- y[!apply(bool_mat,1,any)] # [1] "nomatch"
Upvotes: 1
Reputation: 50678
We can do the following:
# Return data.frame of matches of a in b
m <- function(a, b) {
data.frame(sapply(a, function(w) grepl(w, b), simplify = F));
}
# Match x and y and remove
x0 <- x[!apply(m(x, y), 2, any)]
y0 <- y[!apply(m(x, y), 1, any)]
# Match y and x and remove
x1 <- x0[!apply(m(y0, x0), 1, any)]
y1 <- y0[!apply(m(y0, x0), 2, any)]
x1;
#character(0)
x2;
#[1] "nomatch"
Upvotes: 1