Reputation: 1141
In lex & yacc there is a macro called YY_INPUT which can be redefined, for example in a such way
#define YY_INPUT(buf,result,maxlen) do { \
const int n = gzread(gz_yyin, buf, maxlen); \
if (n < 0) { \
int errNumber = 0; \
reportError( gzerror(gz_yyin, &errNumber)); } \
\
result = n > 0 ? n : YY_NULL; \
} while (0)
I have some grammar rule which called YYACCEPT macro. If after YYACCEPT I called gztell (or ftell), then I got a wrong number, because parser already read some unnecessary data.
So how I can get current position if I have some rule which called YYACCEPT in it(one bad solution will be to read character by character)
(I have already done something like this:
#define YY_USER_ACTION do { \
current_position += yyleng; \
} while (0)
but seems its not work )
Upvotes: 2
Views: 5358
Reputation: 241861
You have to keep track of the offset yourself. A simple but annoying solution is to put:
offset += yyleng;
in every flex action. Fortunately, you can do this implicitly by defining the YY_USER_ACTION
macro, which is executed just before the token action.
That might still not be right for your grammar, because bison
often reads one token ahead. So you'll also need to attach the value of offset
to each lexical token, most conveniently using the location facility (yylloc
).
Edit: added more details on location tracking.
The following has not been tested. You should read the sections in both the flex
and the bison
manual about location tracking.
The yylloc
global variable and its default type are included in the generated bison code if you use the --locations
command line option or the %locations
directive, or if you simply refer to a location value in some rule, using the @
syntax, which is analogous to the $
syntax (that is, @n
is the location value of the right-hand-side object whose semantic value is $n
). Unfortunately, the default type for yylloc
uses int
s, which are not wide enough to hold a file offset, although you might not be planning on parsing files for which this matters. In any event, it's easy enough to change; you merely have to #define
the YYLTYPE
macro at the top of your bison
file. The default YYLTYPE
is:
typedef struct YYLTYPE
{
int first_line;
int first_column;
int last_line;
int last_column;
} YYLTYPE;
For a minimum modification, I'd suggest keeping the names unchanged; otherwise you'll also need to fix the YYLLOC_DEFAULT
macro in your bison
file. The default YYLLOC_DEFAULT
ensures that non-terminals get a location value whose first_line
and first_column
members come from the first element in the non-terminal's RHS, and whose last_line
and last_column
members come from the last element. Since it is a macro, it will work with any assignable type for the various members, so it will be sufficient to change the column
members to long
, size_t
or offset_t
, as you feel appropriate:
#define YYLTYPE yyltype;
typedef struct yyltype {
int first_line;
offset_t first_column;
int last_line;
offset_t last_column;
} yyltype;
Then in your flex
input, you could define the YY_USER_ACTION
macro:
offset_t offset;
extern YYLTYPE yylloc;
#define YY_USER_ACTION \
offset += yyleng; \
yylloc.last_line = yylineno; \
yylloc.last_column = offset;
With all that done and appropriate initialization, you should be able to use the appropriate @n.last_column
in the ACCEPT
rule to extract the offset of the end of the last token in the accepted input.
Upvotes: 7