gorilon
gorilon

Reputation: 434

Flex pattern definition over multiple lines

I am defining a very long pattern in flex, with many or cases. I was wondering if there is a way of writing the definition over several lines to improve code readability. Something like

    %option noyywrap
  1 %{
 ...
 14 %}
 15 
 16 DIGIT           [0-9]
 17 ID              [a-z][a-z0-9]*
 18 LOOP_KWD        for|while|
 19                 his|her                        //THIS IS WHAT I WOULD LIKE
 20 SELECT_KWD      if|else
 21 STRING          \".*\"
 22 COMP_OP         <|>|==]
 29 
 30 %%
 31 
 32 {DIGIT}+                {
 33                                                 printf("INT_NUM<%s>", yytext);
 34                                 }
 35 
 36 {INCLUDE}                       {
 37                                                 printf("PREPROCESSOR_INCLUDE");
 38                                 }
 39 {LOOP_KWD}              {
 40                                                 printf("LOOP_KWD<%s>", yytext);
 41                                 }
 42 {SELECT_KWD}            {
 43                                                 printf("SELECT_KWD<%s>", yytext);
 44                                 }

when I try to run this it gives:

flex -o tokenize.c my_first_token.l my_first_token.l:40: unrecognised rule make: *** [all] Error 1

Upvotes: 2

Views: 1375

Answers (2)

Thomas Dickey
Thomas Dickey

Reputation: 54505

lex and flex do not accept continuation lines in the pattern definitions, but do allow line-breaks in the rules section. You could change that rule to something like

for|
while|
his|
her      {
             printf("LOOP_KWD<%s>", yytext);
         }

although I find it preferable to use a lookup table, with the lexer concerned only with syntax. You have a pattern for {ID} which can be used, e.g.,

{ID}    {
           int n;
           for (n = 0; table[n] != 0; ++n) {
               if (!strcmp(yytext, table[n])) {
                   printf("keyword<%s>", yytext);
                   break;
               }
           }
        }

and the table (in the code-section of course):

const char *table[] = { "for", "while", "his", "her", 0 };

Using the pattern {ID} rather than the explicit keywords fixes the problem of spurious matches, e.g., "this" matching "his", "forth" matching "for", etc.

Upvotes: 2

user207421
user207421

Reputation: 310893

The actual problem is the multiline definition of LOOP_KWD on lines 18-19, and the simple answer is that you can't do that.

The more complex answer is that every keyword should have its own rule. Otherwise the parser can't work. So you shouldn't even be trying. Or else don't bother recognizing keywords at all with rules, and just use a lookup table in the IDENTIFIER rule.

Upvotes: 1

Related Questions