Reputation: 749
I have an extremely long string in R and would like to extract all substrings that match a certain criteria. The string may look something like this: "some text some text some text [ID: 1234] some text some text [ID: 5678] some text some text [ID: 9999]."
I have seen other questions posted like this that offer gsub as a solution but that seems to be in the scenario when only one substring needs to be extracted and not multiple.
What I would like to achieve as a result is a vector like this:
c("[ID: 1234]", "[ID: 5678]", "[ID: 9999]")
Upvotes: 2
Views: 792
Reputation: 32558
inds = gregexpr("\\[ID: \\d+\\]", x)
lapply(inds, function(i){
substring(x, i, i + attr(i, "match.length") - 1)
})
#[[1]]
#[1] "[ID: 1234]" "[ID: 5678]" "[ID: 9999]"
Upvotes: 0
Reputation: 146129
x = "some text some text some text [ID: 1234] some text some text [ID: 5678] some text some text [ID: 9999]."
unlist(stringr::str_extract_all(x, "\\[ID: \\d+\\]"))
# [1] "[ID: 1234]" "[ID: 5678]" "[ID: 9999]"
Upvotes: 3
Reputation: 448
Using base R, an option would be
regmatches(text, gregexpr(pattern, text))
which you can then unlist()
if you want your output as an atomic vector.
Upvotes: 2