Reputation: 23
I want to create a regex that returns everything between two multicharacter tokens where the opening token is ;;(
and the closing token is ;;)
, such as
;;(
Capture this part, which can contain everything except the closing token
;;)
I thought the regex /;;\((?!;;\));;\)/
using negative lookahead should work but this is returning no matches. Is it possible to use a regex for this?
Upvotes: 2
Views: 171
Reputation: 626853
In order to match some text between two multicharacter delimiters is a regex that is in line with the unroll-the-loop technique.
So, we have ;;(
and ;;)
delimiters.
The lazy dot matching regex is ;;\((.*?);;\)
. This pattern is not efficient since it will become slower and slower when larger and larger text comes in as input.
Unrolling it like ;;\(([^;]*(?:;(?!;\))[^;]*)*);;\)
makes matching linear and the only problem can occur with speed if there are many ;
inside the block.
It takes timgeb's solution 169 steps to complete the match. It takes mine just 16 steps.
Also, the unrolled regex does not depend on the /s
DOTALL modifier, it can be omitted.
Why not use lookarounds? Lookarounds are good when you need overlapping matches or there are specific conditions. In this case, you need non-overlapping matches because the leading and trailing delimiters are not equal. Use capturing groups, pairs of unescaped parentheses around those subpatterns you need to get. In ;;\(([^;]*(?:;(?!;\))[^;]*)*);;\)
, we need to get all text that is not ;;)
, i.e. this [^;]*(?:;(?!;\))[^;]*)*
part. Thus, we enclose it with ()
.
What does this unrolled part match?
[^;]*
- anything but the ;
(the first char of the trailing delimiter)(?:;(?!;\))[^;]*)*
- zero or more sequences of...
;(?!;\))
- the first char of the trailing delimiter, a literal ;
that is not followed by ;)
(the rest of the trailing delimiter)[^;]*
- zero or more characters other than ;
(the first char of the trailing delimiter)Upvotes: 2
Reputation: 78690
Use a positive lookbehind and positive lookahead.
(?<=;;\().*?(?=;;\))
Demo: https://regex101.com/r/iK5wG4/2
Upvotes: 0