Max
Max

Reputation: 185

Return value from previous row in regex

I am looking to return a specific group in the previous row via regex.

Suppose I have the following information and the target is to extract the value 90 on the basis of the differentiation in the following line.

QTY+66:90:PCE
SCC+2
DTM+45:20200416:15
QTY+66:60:PCE
SCC+3
DTM+35:20210614:2

If I were to traget the value 90, I'd have to look for the SCC+2 tag and if I were to loom for the value 60, it would be the SCC+3 tag.

I got this far in an attempt to return the value 90 (?<=^QTY\+66:)(\d+)(.*\n.*SCC\+2.*) but it seems convoluted and I fail to extract only Group 1. Here is the link to regex101. I am using R for the actual application. Thanks for the help !

Upvotes: 2

Views: 196

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627344

You can use

(?<=:)\d+(?=[^\d\r\n]*[\r\n]+.*SCC\+2)

See the regex demo. Details:

  • (?<=:) - a : must occur immediately to the left of the current location
  • \d+ - one or more digits
  • (?=[^\d\r\n]*[\r\n]+.*SCC\+2) - immediately to the right, there must be
  • [^\d\r\n]* - any zero or more chars other than digits, CR and LF
  • [\r\n]+ - one or more CR or LF chars
  • .*SCC\+2 - any text on a line up to the rigthmost occurrence of SCC+2.

In R, you can use

library(stringr)
str_extract(vec, "(?<=:)\\d+(?=[^\\d\r\n]*[\r\n]+.*SCC\\+2)")

And a couple of base R approaches with sub:

sub(".*?\\+\\d+:(\\d+)[^\r\n]*[\r\n]+[^\r\n]*SCC\\+2.*", "\\1", vec)
sub("(?s).*?\\+\\d+:(\\d+)(?-s).*\\R.*SCC\\+2(?s).*", "\\1", vec, perl=TRUE)

See regex 1 demo and regex 2 demo.

See the R demo online:

vec <- "QTY+66:90:PCE\nSCC+2\nDTM+45:20200416:15\nQTY+66:60:PCE\nSCC+3\nDTM+35:20210614:2"
sub(".*?\\+\\d+:(\\d+)[^\r\n]*[\r\n]+[^\r\n]*SCC\\+2.*", "\\1", vec)
sub("(?s).*?\\+\\d+:(\\d+)(?-s).*\\R.*SCC\\+2(?s).*", "\\1", vec, perl=TRUE)
library(stringr)
str_extract(vec, "(?<=:)\\d+(?=[^\\d\r\n]*[\r\n]+.*SCC\\+2)")

All yield [1] "90".

Upvotes: 1

Related Questions