Jbnimble
Jbnimble

Reputation: 39

Extract Only the Second Instance of Pattern in Regex

I am trying to extract the second instance of a pattern from a string using regexes in the programming language R, version 4.0.2 and the stringr package.

> test_string <- "Viscocity                      S   <=0.25        S   <=0.25 Levorotatory                      S      <=21        R      <=2.5 Giminal                    S      <=1        S      <=1"

I have the following regex which can pull the first pattern (specifically for Levorotatory):

regex <- "(\\s*(?:S|R|I|N/I)(\\s*\\W*\\d*\\.?\\d?\\d?\\d?\\s*))"
str_trim(str_extract_all(test_string, glue('(?<=Levorotatory){regex}')))

Which gives me the output:

"S      <=21"

But I want to grab the second pattern: R <=2.5 So far, I have been able to pull both patterns using a quantifier:

regex <- "(\\s*(?:S|R|I|N/I)(\\s*\\W*\\d*\\.?\\d?\\d?\\d?\\s*)){2}"
str_trim(str_extract_all(test_string, glue('(?<=Levorotatory){regex}')))
output: "S      <=21        R      <=2.5"

This is is not exactly what I was looking for.

My question: Can I grab only the second instance of a regex pattern?

There are a handful of similar posts: here, here, and here, but I tried fiddling with these solutions with no luck.

Upvotes: 1

Views: 313

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626845

You may use a pattern like below with str_match:

(?<=Levorotatory)(?:\s*([SRI]|N/I)\s*([^\w\s]*\d*\.?\d+)){2}

See the regex demo. You may control which match you get with {X} at the end. Details:

  • (?<=Levorotatory) - right before the current location, there must be Levorotatory (note you may just use Levorotatory here)
  • (?:\s*([SRI]|N/I)\s*([^\w\s]*\d*\.?\d+)){2} - two occurrences of
    • \s* - zero or more whitespaces
    • ([SRI]|N/I) - S, R, I or N/I
    • \s* - zero or more whitespaces
    • ([^\w\s]*\d*\.?\d+) - zero or more punctuation chars other than _, 0+ digits, an optional . and one or more digits.

See an R demo:

library(stringr)
pattern <- "(?<=Levorotatory)(?:\\s*([SRI]|N/I)\\s*([^\\w\\s]*\\d*\\.?\\d+)){2}";
x <- "Viscocity                      S   <=0.25        S   <=0.25 Levorotatory                      S      <=21        R      <=2.5 Giminal                    S      <=1        S      <=1"
results <- str_match(x, pattern)[,-1]
results
# => [1] "R"     "<=2.5"

Upvotes: 1

Related Questions