Reputation: 2749
In the generated C code from Flex, I see that YY_BUFFER_STATE
is declared as the following
struct yy_buffer_state
{
...
int yy_bs_lineno; /**< The line count. */
int yy_bs_column; /**< The column count. */
...
}
I would like to announce lexing errors to the user along with the line number and column of the incorrect token, but the yy_bs_lineno
and yy_bs_column
fields aren't touched anywhere after being initialized to 1 and 0, respectively. Do I need to increment these myself somewhere in my lexing definitions?
Upvotes: 1
Views: 1609
Reputation: 241791
The data members of the flex
buffer state are private; you shouldn't try to use them. In particular, the yy_bs_lineno
and yy_bs_column
members are used by flex
in reentrant scanners.
If you use:
%option yylineno
then flex
will keep track of the current line number in the variable yylineno
.
Note that yylineno
is the line number of the line of the first character following the token. So if the token includes (or even ends with) a newline character, yylineno
's value will be a little deceptive. If you have multi-line tokens (multi-line string constants, for example) then it's worthwhile keeping the previous value of yylineno
around.
Flex
is pretty clever about line counting. It knows which tokens cannot match newline characters, for example, so it doesn't need to rescan input after it finds one of those tokens. It's almost always a good idea to let flex do the work for you.
Unfortunately, there is no similar simple way of tracking columns, but there are a couple of things which help. One of them is the macro hook YY_USER_ACTION
. If you define this macro, it will be executed just before every action. So you can use that to keep your line number information in sync.
Here's the simplest example I could cut-and-paste:
%{
# include <stdio.h>
/* WARNINGS:
* 1. Reentrant scanners define yycolumn
* Only use this in a non-reentrant scanner
* 2. This will not work if you use `yyless()` or `yymore()`.
*/
int yycolumn = 1;
/* Forward declarations */
void report(const char* ttype, int line, int column);
/* This is executed before every action. */
#define YY_USER_ACTION \
start_line = prev_yylineno; start_column = yycolumn; \
if (yylineno == prev_yylineno) yycolumn += yyleng; \
else { \
for (yycolumn = 1; yytext[yyleng - yycolumn] != '\n'; ++yycolumn) {} \
prev_yylineno = yylineno; \
}
%}
%option noyywrap nounput noinput
%option yylineno
%%
/* Any indented text before the first rule goes at the top of the lexer. */
int start_line, start_column;
int prev_yylineno = yylineno;
[[:space:]]+ { }
[[:alpha:]_][[:alnum:]_]* { report("ID", start_line, start_column);
return 258;
}
["]([^"]|\\.)*["] { report("STR", start_line, start_column);
return 259;
}
. { report("SYM", start_line, start_column);
return yytext[0];
}
%%
void report(const char* t, int l, int c) {
printf("Encountered %s \"%.*s\" at %d:%d\n", t, (int)yyleng, yytext, l, c);
}
int main(int argc, char** argv) {
while (yylex() > 0) {}
return 0;
}
Upvotes: 4