Reputation: 4333
I am trying to match text inside %[
and ]%
in single or multiple lines. First thing I tried was:
\%\[(.*?)\]\% return MULTILINE_TEXT;
but this works only for single line cases, not for multiple lines. So, I thought I could use /s
:
/\%\[(.*?)\]\%/s return MULTILINE_TEXT;
But flex see this as an invalid rule. The last thing I tried was:
\%\[((.*?|\n)*?)\]\% return MULTILINE_TEXT;
which seemed to work, but it doesn't stop at the first ]%
. In the following example:
%[ Some text ...
Some text ... ]%
... other stuff ...
%[ Some more text ...
Some more text ... ]%
flex will return the entire thing as a single token. What can I do?
Upvotes: 0
Views: 525
Reputation: 241771
Note that *?
is not treated as a non-greedy match by flex.
Flex does support some regex flags, but its syntax is a little different than most regex libraries. For example, you can change the meaning of .
by setting the s
flag; the change applies to the region within the parentheses (and not following the flag setting, as in PCRE):
"%["(?s:.*)"%]"
It's more common to see the lex-compatible usage:
"%["(.|\n)*"%]"
You can also use the x
flag for slightly more readable regexes:
(?xs: "%[" .* "%]" )
(The x
flag does not work in definitions, only in pattern rules.)
Quoted strings (as above) is another (f)lex-specific syntax, which can be more readable than backslash escapes, although backslash escapes also work. But flex does not implement PCRE/Gnu/JS extensions such as \w
and \s
.
See the flex manual for a complete guide to flex regexes; it's definitely worth reading if you are used to other regex syntaxes.
You will probably find it disappointing that (f)lex does not support many common regex extensions, including non-greedy matches. That makes it awkward to write patterns for patterns terminated by multiple characters, as with your example. If the delimiters %[
and %]
cannot be nested, so that you really want the match to end with the first %]
, you could use something like this:
%\[([^%]|%+[^]])*%+\] or (?x: "%[" ( [^%] | %+ [^]] )* %* "%]" )
That's a bit hard to read, but it is precise: %[
followed by any number of repetitions of either a character other than %
or a sequence of %
followed by something other than ]
, ending with a sequence of %
followed by a ]
.
In the above pattern, you need %+
rather than %
to deal with strings like:
%[%% text surrounded by percents%%%]
A more readable solution which also allows for nested %[
is to use start conditions. There's a complete example of a very similar solution in this answer.
Upvotes: 6