Reputation: 39
I am trying to extract the second instance of a pattern from a string using regexes in the programming language R
, version 4.0.2 and the stringr
package.
> test_string <- "Viscocity S <=0.25 S <=0.25 Levorotatory S <=21 R <=2.5 Giminal S <=1 S <=1"
I have the following regex which can pull the first pattern (specifically for Levorotatory):
regex <- "(\\s*(?:S|R|I|N/I)(\\s*\\W*\\d*\\.?\\d?\\d?\\d?\\s*))"
str_trim(str_extract_all(test_string, glue('(?<=Levorotatory){regex}')))
Which gives me the output:
"S <=21"
But I want to grab the second pattern: R <=2.5
So far, I have been able to pull both patterns using a quantifier:
regex <- "(\\s*(?:S|R|I|N/I)(\\s*\\W*\\d*\\.?\\d?\\d?\\d?\\s*)){2}"
str_trim(str_extract_all(test_string, glue('(?<=Levorotatory){regex}')))
output: "S <=21 R <=2.5"
This is is not exactly what I was looking for.
My question: Can I grab only the second instance of a regex pattern?
There are a handful of similar posts: here, here, and here, but I tried fiddling with these solutions with no luck.
Upvotes: 1
Views: 313
Reputation: 626845
You may use a pattern like below with str_match
:
(?<=Levorotatory)(?:\s*([SRI]|N/I)\s*([^\w\s]*\d*\.?\d+)){2}
See the regex demo. You may control which match you get with {X}
at the end. Details:
(?<=Levorotatory)
- right before the current location, there must be Levorotatory
(note you may just use Levorotatory
here)(?:\s*([SRI]|N/I)\s*([^\w\s]*\d*\.?\d+)){2}
- two occurrences of
\s*
- zero or more whitespaces([SRI]|N/I)
- S
, R
, I
or N/I
\s*
- zero or more whitespaces([^\w\s]*\d*\.?\d+)
- zero or more punctuation chars other than _
, 0+ digits, an optional .
and one or more digits.See an R demo:
library(stringr)
pattern <- "(?<=Levorotatory)(?:\\s*([SRI]|N/I)\\s*([^\\w\\s]*\\d*\\.?\\d+)){2}";
x <- "Viscocity S <=0.25 S <=0.25 Levorotatory S <=21 R <=2.5 Giminal S <=1 S <=1"
results <- str_match(x, pattern)[,-1]
results
# => [1] "R" "<=2.5"
Upvotes: 1