Antti
Antti

Reputation: 1293

How to pick subset of strings that appear in longer string (in R)?

I have a small set of identifiers:

ids <- c("abc", "def", "ghi", "jkl")

I also have set of longer strings:

stringList <- list(
               string1 = "fgjalk klsdkabc ghi", 
               string2 = "DFjklHJHU defhkk")

I want to make list of vectors that have the subsets of identifiers that appear in the strings:

 idList <- list(id1 = c("abc", "ghi"), id2 = c("def", "jkl"))

What is most efficient way to do this?

I've tried with

lapply(seq_along(stringList), function(x) grepl(ids, stringList[[x]]))

but I think I'm still missing the part that picks all along the vector ids, not just the first one.

Upvotes: 0

Views: 37

Answers (1)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193527

Maybe something like this would work for you:

lapply(stringList, function(x) {
  regmatches(x, gregexpr(paste(ids, collapse = "|"), x))[[1]]
})
# $string1
# [1] "abc" "ghi"
# 
# $string2
# [1] "jkl" "def"

Upvotes: 1

Related Questions