Reputation: 749

Extract all substrings meeting criteria using R regex

I have an extremely long string in R and would like to extract all substrings that match a certain criteria. The string may look something like this: "some text some text some text [ID: 1234] some text some text [ID: 5678] some text some text [ID: 9999]."

I have seen other questions posted like this that offer gsub as a solution but that seems to be in the scenario when only one substring needs to be extracted and not multiple.

What I would like to achieve as a result is a vector like this:

c("[ID: 1234]", "[ID: 5678]", "[ID: 9999]")

Upvotes: 2

Answers (3)

d.b

Reputation: 32558

inds = gregexpr("\\[ID: \\d+\\]", x)
lapply(inds, function(i){
    substring(x, i, i + attr(i, "match.length") - 1)
})
#[[1]]
#[1] "[ID: 1234]" "[ID: 5678]" "[ID: 9999]"

Upvotes: 0

Gregor Thomas

Reputation: 146129

x = "some text some text some text [ID: 1234] some text some text [ID: 5678] some text some text [ID: 9999]."
unlist(stringr::str_extract_all(x, "\\[ID: \\d+\\]"))
# [1] "[ID: 1234]" "[ID: 5678]" "[ID: 9999]"

Upvotes: 3

Hayden Y.

Reputation: 448

Using base R, an option would be

regmatches(text, gregexpr(pattern, text))

which you can then unlist() if you want your output as an atomic vector.

Upvotes: 2

Extract all substrings meeting criteria using R regex

Answers (3)

Related Questions