Reputation: 434
I am defining a very long pattern in flex, with many or cases. I was wondering if there is a way of writing the definition over several lines to improve code readability. Something like
%option noyywrap
1 %{
...
14 %}
15
16 DIGIT [0-9]
17 ID [a-z][a-z0-9]*
18 LOOP_KWD for|while|
19 his|her //THIS IS WHAT I WOULD LIKE
20 SELECT_KWD if|else
21 STRING \".*\"
22 COMP_OP <|>|==]
29
30 %%
31
32 {DIGIT}+ {
33 printf("INT_NUM<%s>", yytext);
34 }
35
36 {INCLUDE} {
37 printf("PREPROCESSOR_INCLUDE");
38 }
39 {LOOP_KWD} {
40 printf("LOOP_KWD<%s>", yytext);
41 }
42 {SELECT_KWD} {
43 printf("SELECT_KWD<%s>", yytext);
44 }
when I try to run this it gives:
flex -o tokenize.c my_first_token.l
my_first_token.l:40: unrecognised rule
make: *** [all] Error 1
Upvotes: 2
Views: 1375
Reputation: 54505
lex
and flex
do not accept continuation lines in the pattern definitions, but do allow line-breaks in the rules section. You could change that rule to something like
for|
while|
his|
her {
printf("LOOP_KWD<%s>", yytext);
}
although I find it preferable to use a lookup table, with the lexer concerned only with syntax. You have a pattern for {ID}
which can be used, e.g.,
{ID} {
int n;
for (n = 0; table[n] != 0; ++n) {
if (!strcmp(yytext, table[n])) {
printf("keyword<%s>", yytext);
break;
}
}
}
and the table (in the code-section of course):
const char *table[] = { "for", "while", "his", "her", 0 };
Using the pattern {ID}
rather than the explicit keywords fixes the problem of spurious matches, e.g., "this" matching "his", "forth" matching "for", etc.
Upvotes: 2
Reputation: 310893
The actual problem is the multiline definition of LOOP_KWD on lines 18-19, and the simple answer is that you can't do that.
The more complex answer is that every keyword should have its own rule. Otherwise the parser can't work. So you shouldn't even be trying. Or else don't bother recognizing keywords at all with rules, and just use a lookup table in the IDENTIFIER rule.
Upvotes: 1