Reputation: 418
I have a character vector
c1 <- c("BEL","BEL","BEL","BEL")
and another character vector of same length
c2 <- c(" BEL-65_DRe-I_1p:BEL;_LTR_Retrotransposon;_Transposable_Element;_Nonautonomous;_BEL-65_DRe-I", "L1-2_NN_3p:L1;_Non-LTR_Retrotransposon;_Transposable_Element;_L1-2_NN", "BEL-13_CQ-I_1p:BEL;_LTR_Retrotransposon;_Transposable_Element;_BEL-13_CQ_;_BEL-13_CQ-LTR;_BEL-13_CQ-I", "BEL-31_CQ-I_1p:BEL;_LTR_Retrotransposon;_Transposable_Element;_BEL-31_CQ_;_BEL-31_CQ-LTR;_BEL-31_CQ-I", "Gypsy-22_CQ-I_1p:Gypsy;_LTR_Retrotransposon;_Transposable_Element;_Gypsy-22_CQ_;_Gypsy-22_CQ-LTR;_Gypsy-22_CQ-I")
I want to know if each string in c1
is found in c2
at the same index (ignoring case), i.e. if c1[1]
is found inc2[1]
, c1[2]
in c2[2]
, and so on.
In practice, the vectors can have millions of elements.
My current solution is
test <- Map(function(x,y) grepl(x,y, ignore.case = T), c1, c2)
But it's not vectorised, hence relatively slow. Is there a better solution?
Upvotes: 1
Views: 1480
Reputation: 3062
You could try the following using the stringr package:
require(stringr)
require(data.table)
data <- data.table(c1, c2)
data[, FOUND:= str_detect(toupper(c2), toupper(c1))]
Upvotes: 3
Reputation: 54277
This runs quite fast:
library(stringi)
c1 <- stri_rand_strings(1e6, 2)
c2 <- paste0(stri_rand_strings(1e6, 20), tolower(c1))
system.time(res <- stri_detect(c2, fixed = c1, case_insensitive = TRUE))
# User System verstrichen
# 0.73 0.00 0.75
Partly, because I did not check for a regular expression pattern but for a constant string (fixed
), which you could also use in grep*
.
Upvotes: 4
Reputation: 301
What would work as well, as your solutions is to use apply
.
For this small example it works well, if it will be faster for bigger data, I do not know.
apply(rbind(c1,c2), 2, function(y){grepl(pattern = y[1],x=y[2], ignore.case = T)})
[1] TRUE FALSE TRUE TRUE FALSE
Edited: I had to add one more "BEL" to make it work, because your c1 consists of 4 elements and c2 of 5
Upvotes: 1