Henk
Henk

Reputation: 3656

check each element of vector against all rows of data frame

I have a vector, for which I want to check each element against each row of a data frame. It involves a grep function, since the elements to be checked are buried in other text.

With help of this forum, I got this code:

    mat=data.frame(par=c('long A story','C story', 'blabla D'),val=1:3) 
    vec=c('Z','D','A')
    mat$label <- NA
    for (x in vec){
       is.match <- lapply(mat$par,function(y) grep(x, y))
       mat$label[which(is.match > 0)] <- x
    }

The problem is that it takes minutes to execute. Is there a way to vectorize this?

Upvotes: 2

Views: 940

Answers (1)

sebastian-c
sebastian-c

Reputation: 15395

I've assumed you only want the first match in each case:

which.matches <- grep("[ZDA]", mat$par)
what.matches <- regmatches(mat$par, regexpr("[ZDA]", mat$par))

mat$label[which.matches] <- what.matches
mat

           par val label
1 long A story   1     A
2      C story   2  <NA>
3     blabla D   3     D

EDIT: Benchmarking

Unit: microseconds
           expr     min       lq  median       uq      max
1   answer(mat) 185.338 194.0925 199.073 209.1850  898.919
2 question(mat) 672.227 693.9610 708.601 725.6555 1457.046

EDIT 2:

As @mrdwab suggested, this can actually be used as a one-liner:

mat$label[grep("[ZDA]", mat$par)] <- regmatches(mat$par, regexpr("[ZDA]", mat$par))

Upvotes: 3

Related Questions