Reputation: 163
Regex101 Tester: https://regex101.com/r/Yfp311/2
I am having difficulty getting the following regex pattern to work. For the sample text strings below, REF1 is matched for the entire line, ignoring the optional REF2 group that should be matched if the "//[text]" is found in line.
At the moment, regex is not acknowledging the //[text] and incorrectly matching the entire text as REF1. I am assuming this is a characteristic of greedy matching .. however I was unsuccessful at implementing a non-greedy pattern, and also lookahead/lookbehind (did not appear to work) either.
Any help or guidance would be greatly appreciated ... not sure what I am missing as I would think my current regex pattern should work without issue. Please let me know if I can clarify anything! Thank you!
^(?P<ID>[A-Z][A-Z0-9]{3})?(?P<REF1>.+)(//(?P<REF2>.+))?(\n?(?P<EXTRA>.+))?$
TEX1CNS0P5-AA//CAT-523-VID-00EOS-0
XUX PETER LAB RANDOM TEXT DM5.
TEX2BFTBSH9999SBRT2L
RATRACE201
TEX3GWS0P2-AA//D-14839048-99-3
THERE WAS 200 COALS IN HIS STOCKING.
Expected Matches:
Upvotes: 0
Views: 187
Reputation: 163
I ended up discovering a more ideal answer, as the provided regex patterns fail when REF1 text contains a single "/" forward slash.
^(?P<ID>[A-Z][A-Z0-9]{3})?(?P<REF1>(?:(?!//).)+)(//(?P<REF2>.+))?(\n?(?P<EXTRA>.+))?$
For Example - https://regex101.com/r/Yfp311/4
TEX4POF OF 20/03/09//CAT342134832489
P/O:1600 PARK AVENUE
Using a negative lookahead regex pattern helped resolved this gap.
Upvotes: 0
Reputation: 681
^(?P<ID>[A-Z][A-Z0-9]{3})?(?P<REF1>[^/\n]+)(//(?P<REF2>.+))?(\n?(?P<EXTRA>.+))?$
I have updated it. I guess it passes the required cases now:
https://regex101.com/r/Yfp311/3
The issue with the original implementation is REF1
matches everything apart from line terminators. So it matched //
as well.
Upvotes: 2
Reputation: 798
How about
^(?P<ID>[A-Z][A-Z0-9]{3})?(?P<REF1>[^/\n]+)(//(?P<REF2>.+))?(\n?(?P<EXTRA>.+))?$
?
I think a hand-written parser is more achievable in this case.
Upvotes: 1