Vortico
Vortico

Reputation: 2749

How do I use `yy_bs_lineno` and `yy_bs_column` in Flex?

In the generated C code from Flex, I see that YY_BUFFER_STATE is declared as the following

struct yy_buffer_state
{
    ...
    int yy_bs_lineno; /**< The line count. */
    int yy_bs_column; /**< The column count. */
    ...
}

I would like to announce lexing errors to the user along with the line number and column of the incorrect token, but the yy_bs_lineno and yy_bs_column fields aren't touched anywhere after being initialized to 1 and 0, respectively. Do I need to increment these myself somewhere in my lexing definitions?

Upvotes: 1

Views: 1609

Answers (1)

rici
rici

Reputation: 241791

The data members of the flex buffer state are private; you shouldn't try to use them. In particular, the yy_bs_lineno and yy_bs_column members are used by flex in reentrant scanners.

If you use:

%option yylineno

then flex will keep track of the current line number in the variable yylineno.

Note that yylineno is the line number of the line of the first character following the token. So if the token includes (or even ends with) a newline character, yylineno's value will be a little deceptive. If you have multi-line tokens (multi-line string constants, for example) then it's worthwhile keeping the previous value of yylineno around.

Flex is pretty clever about line counting. It knows which tokens cannot match newline characters, for example, so it doesn't need to rescan input after it finds one of those tokens. It's almost always a good idea to let flex do the work for you.

Unfortunately, there is no similar simple way of tracking columns, but there are a couple of things which help. One of them is the macro hook YY_USER_ACTION. If you define this macro, it will be executed just before every action. So you can use that to keep your line number information in sync.

Here's the simplest example I could cut-and-paste:

%{
#  include <stdio.h>

/* WARNINGS:
 *   1. Reentrant scanners define yycolumn
 *      Only use this in a non-reentrant scanner
 *   2. This will not work if you use `yyless()` or `yymore()`.
 */
int yycolumn = 1;

/* Forward declarations */
void report(const char* ttype, int line, int column);

/* This is executed before every action. */
#define YY_USER_ACTION                                                   \
  start_line = prev_yylineno; start_column = yycolumn;                   \
  if (yylineno == prev_yylineno) yycolumn += yyleng;                     \
  else {                                                                 \
    for (yycolumn = 1; yytext[yyleng - yycolumn] != '\n'; ++yycolumn) {} \
    prev_yylineno = yylineno;                                            \
  }

%}
%option noyywrap nounput noinput
%option yylineno

%%
   /* Any indented text before the first rule goes at the top of the lexer.  */
   int start_line, start_column;
   int prev_yylineno = yylineno;

[[:space:]]+              { }
[[:alpha:]_][[:alnum:]_]* { report("ID", start_line, start_column);
                            return 258; 
                          }
["]([^"]|\\.)*["]         { report("STR", start_line, start_column);
                            return 259;
                          }
.                         { report("SYM", start_line, start_column);
                            return yytext[0];
                          }
%%
void report(const char* t, int l, int c) {
  printf("Encountered %s \"%.*s\" at %d:%d\n", t, (int)yyleng, yytext, l, c);
}

int main(int argc, char** argv) {
  while (yylex() > 0) {}
  return 0;
}

Upvotes: 4

Related Questions