Reputation: 421
I'm struggling with defining a regex that can find the following:
For all PLUXXX,Y=11, I need to use XXX to find the value associated with REFXXX.
Given the input:
REF178=1234
OTHER RANDOM DATA LINE
PLU178,1=11
OTHER RANDOM DATA LINE
PLU179,1=11
REF179=5678
OTHER RANDOM DATA LINE
OTHER RANDOM DATA LINE
REF180=5432
PLU180,1=12
REF181=123432
I would like to get 1234 and 5678 back. REF180 is ignored since PLU180 equals 12. REF181 is ignored since there is no matching PLU.
The input file has many lines and multiple REF/PLU pairs with different XXX values. The order of REF/PLU lines are random and there are other lines of data in between.
I'm only interested in the REF values if the matching PLU has a value of 11.
I can match the PLU with
(PLU)(.+)(\,.+=11)
knowing that the 2nd capture group somehow should be paired with REF to find the wanted value.
I imagine that positive lookahead should be used, but can't figure out the correct syntax.
Upvotes: 1
Views: 98
Reputation: 627419
You can use
^REF(\d+)=\K\d+(?=(?:\R(?!REF\d).*)*?\RPLU\1,1=11$)|^PLU(\d+),1=11(?:\R(?!REF\d).*)*?\RREF\2=\K\d+
See the regex demo (the m
flag is on automatically in Notepad++). Note that the .
matches newline must be disabled.
Details:
^REF(\d+)=\K\d+(?=(?:\R(?!REF\d).*)*?\RPLU\1,1=11$)
- Cases where REF
is before PLU
:
^REF
- REF
at the start of a line(\d+)
- Group 1: one or more digits=
- a =
char\K
- removes all text matched so far\d+
- one or more digits (consumed text)(?=(?:\R(?!REF\d).*)*?\RPLU\1,1=11$)
- a positive lookahead that makes sure there are zero or more (but as few as possible) lines not starting with REF
+ digit and then a line having PLU
, same value as in Group 1, ,1=11
and the end of the line immediately to the right of the current location|
- or^PLU(\d+),1=11(?:\R(?!REF\d).*)*?\RREF\2=\K\d+
- Cases where PLU
is before REF
. Analogous to the first part, but PLU
and REF
are swapped.See the Notepad++ demo:
Upvotes: 1