Reputation: 152
Is is possible to place conditional statements for rules in flex? I need this in order to match a specific rule only when some condition is true. Something like this:
%option c++
%option noyywrap
%%
/*if (condition) {*/
[0-9] {}
/*}*/
[0-9]{2} {}
. /* eat up any unmatched character */
%%
void yylex(void);
int main()
{
FlexLexer *lexer = new yyFlexLexer();
lexer->yylex();
delete lexer;
}
Or is it possible to modify the final c++ generated code, in order to match only some specific regex rules?
UPDATE:
Using start conditions doesn't seem to help. What I want is, depending on some external variable (like isNthRegexActive), to be able to match or not a specific regex. For example, if I have 4 regex rules, and the first and 2nd are not active, the program should only check for the other 2, and always check for all of them (don't stop at first match - maybe use REJECT).
Example for 4 rules:
/* Declared at the top */
isActive[0] = false;
isActive[1] = false;
isActive[2] = true;
isActive[3] = true;
%%
[0-9]{4} { printf("1st: %s\n", yytext); REJECT;}
[0-3]{2}[0-3]{2} { printf("2nd: %s\n", yytext); REJECT; }
[1-3]{2}[0-3]{2} { printf("3rd: %s\n", yytext); REJECT; }
[1-2]{2}[1-2]{2} { printf("4th: %s\n", yytext); REJECT; }
.
%%
For the input: 1212
the result should be:
3rd: 1212
4th: 1212
Upvotes: 0
Views: 691
Reputation: 241931
Don't use REJECT
unless you absolutely have no alternative. It massively slows down the lexical scan, artificially limits the size of the token buffer, and makes your lexical analysis very hard to reason about.
You might be thinking that a scanner generated by (f)lex tests each regular expression one at a time in order to select the best one for a given match. It does not work that way; that would be much too slow. What it does is, effectively, check all the regular expressions in parallel by using a precompiled deterministic state machine represented as a lookup table.
The state machine does exactly one transition for each input byte, using a simple O(1) lookup into the the transition table. (There are different table compression techniques which trade off table size against the constant in the O(1) lookup, but that doesn't change the basic logic.) What that all means is that you can use as many regular expressions as you wish; the time to do the lexical analysis does not depend on the number or complexity of the regular expressions. (Except for caching effects: if your transition table is really big, you might start running into cache misses during the transition lookups. In such cases, you might prefer a compression algorithm which compresses more.)
In most cases, you can use start conditions to achieve conditional matching, although you might need a lot of start conditions if there are a more than a few interacting conditions. Since the scanner can only have one active start condition, you'll need to generate a different start condition for each legal combination of the conditions you want to consider. That's usually most easily achieved through automatic generation of your scanner rules, but it can certainly be done by hand if there aren't too many.
It's hard to provide a more concrete suggestion without knowing what kinds of conditions you need to check.
Upvotes: 1