Extract Only the Second Instance of Pattern in Regex

Question

I am trying to extract the second instance of a pattern from a string using regexes in the programming language R, version 4.0.2 and the stringr package.

> test_string <- "Viscocity                      S   <=0.25        S   <=0.25 Levorotatory                      S      <=21        R      <=2.5 Giminal                    S      <=1        S      <=1"

I have the following regex which can pull the first pattern (specifically for Levorotatory):

regex <- "(\s*(?:S|R|I|N/I)(\s*\W*\d*\.?\d?\d?\d?\s*))"
str_trim(str_extract_all(test_string, glue('(?<=Levorotatory){regex}')))

Which gives me the output:

"S      <=21"

But I want to grab the second pattern: R <=2.5 So far, I have been able to pull both patterns using a quantifier:

regex <- "(\s*(?:S|R|I|N/I)(\s*\W*\d*\.?\d?\d?\d?\s*)){2}"
str_trim(str_extract_all(test_string, glue('(?<=Levorotatory){regex}')))
output: "S      <=21        R      <=2.5"

This is is not exactly what I was looking for.

My question: Can I grab only the second instance of a regex pattern?

There are a handful of similar posts: here, here, and here, but I tried fiddling with these solutions with no luck.

Wiktor Stribiżew · Accepted Answer

You may use a pattern like below with str_match:

(?<=Levorotatory)(?:\s*([SRI]|N/I)\s*([^\w\s]*\d*\.?\d+)){2}

See the regex demo. You may control which match you get with {X} at the end. Details:

(?<=Levorotatory) - right before the current location, there must be Levorotatory (note you may just use Levorotatory here)
(?:\s*([SRI]|N/I)\s*([^\w\s]*\d*\.?\d+)){2} - two occurrences of
- \s* - zero or more whitespaces
- ([SRI]|N/I) - S, R, I or N/I
- \s* - zero or more whitespaces
- ([^\w\s]*\d*\.?\d+) - zero or more punctuation chars other than _, 0+ digits, an optional . and one or more digits.

See an R demo:

library(stringr)
pattern <- "(?<=Levorotatory)(?:\s*([SRI]|N/I)\s*([^\w\s]*\d*\.?\d+)){2}";
x <- "Viscocity                      S   <=0.25        S   <=0.25 Levorotatory                      S      <=21        R      <=2.5 Giminal                    S      <=1        S      <=1"
results <- str_match(x, pattern)[,-1]
results
# => [1] "R"     "<=2.5"

Extract Only the Second Instance of Pattern in Regex

Answers (1)

Related Questions