Arthur Putnam
Arthur Putnam

Reputation: 1101

In bison is there a way to return the Name of a token instead of its type

I am working with Flex and Bison. in my parse.y (bison) I define tokens. When the token is return it returns an int I was wondering if there is a way to take that int and map it back to the actual name in the bison source. For example in my parser.y

//define my tokens that are shared with my lexer (flex)
%token <tokenData> ID
%token <tokenData> NUMCONST

in my grammar I then use

number : NUMCONST   {std::cout<<"Line "<<$1->linenum<<" Token: [I want NUMCONST]"<<<std::endl;}

I know I can display the int that is returned from the lexer but is there away to return the token's type such as "NUMCONST" or "ID". I want token "type" instead of token "int"

Upvotes: 3

Views: 2962

Answers (2)

rici
rici

Reputation: 241671

Yes you can, but you need to enable the feature in your Bison file.

Up to version 3.6.3 (or so), you could put a function like this in the code segment at the end of your Bison source:

const char* token_name(int t) {
    return yytname[YYTRANSLATE(t)];
}

Since v3.6, you are expected to use something like the following:

const char* token_name(int t) {
    return yysymbol_name(YYTRANSLATE(t));
}

Either way, token_name must be defined in the parser source file, because yytname is a static variable in the generated Bison code, yysymbol_name is a static function, and YYTRANSLATE is a macro (which references another static variable). But if you define it with external linkage, you can use it in any source file which includes its declaration.

In order for yytname to actually be present in the compiled code, you needed to request that the necessary tables be compiled into your parser, by using the %token-table directive or the -k/--token-table command line flag. Alternatively, you could request that the parser be generated with debugging code, using the -t/--debug command line flag; or with the bison directives %debug, now deprecated, or its replacement, %define parse.trace; or by putting #define YYDEBUG in a code block in your Bison prologue.

In v3.6, yytname was declared "obsolescent" (which I think is not quite the same as deprecated) and the documentation now indicates that %token-table is incompatible with new custom and detailed values for %define parse.error. Consequently, it is recommended that the interface yysymbol_name(t) be used instead of yytranslate[t]. However, without using %token-table, the only way of ensuring that the yysymbol_name interface is available is to request debugging code, as above. (Or you can use one of the new values for %define parse.error, but that has other consequences.)

yytname (or its replacement) is an array of character strings indexed by the parser's internal symbol number. That's not the same as the externally-defined token number, which is what is defined in the generated header file and what the parser expects to be returned from yylex.

External token numbers are a sparse encoding. Token number 0 indicates end-of-file. Token numbers 1 through 255 are used to implement single-character tokens, written between single quotes in your grammar (for example, expr: expr '+' expr). Token number 256 is used by the generated parser to indicate the error pseudo-token, and 257 is used to replace invalid token numbers. (These should not ever be returned by yytext.) The other tokens are assigned values starting at 258.

Internally, Bison uses a dense recoding, called yysymbol_t in recent Bison versions. Symbol 0 still means end-of-file; symbols 1 and 2 are error and undefined token (token numbers 256 and 257), and those are followed sequentially by all the tokens actually used in the grammar. (So the only single-character tokens which have a symbol number are the ones mentioned explicitly in some grammar rule; any other small integer returned by yylex will be recoded to symbol 2 (undefined token). Bison also gives symbol numbers to non-terminals, including the $accept pseudo-non-terminal; these start immediately after the token numbers.

The YYTRANSLATE macro checks to make sure its argument is in range for token numbers, and then uses the yytranslate table to translate the token number into a symbol numbers. This macro (and the table) are not documented anywhere, but they're in the generated parser code. (Unless you %define api.token.raw, in which case the token number and the symbol number are the same for every token, and you cannot use single-quoted single-character tokens.)

The token names in the yytname table are the token aliases (double-quoted strings in the Bison grammar), if you use that feature. For example, if your grammar included:

%token EQEQ "=="
%%
exp: exp "==" exp
   | exp '+' exp

the name strings for the tokens corresponding to the two operators in the exp rules are "==" and '+'.

If a token alias includes a character which must be backslash-escaped, the backslash will also be in the yytname string. (Part of the purpose for making yytname obsolescent is to simplify the need to unescape these strings in order to print them.)


Normally, there is no need to look up token names. If you merely want to track what the parser is doing, you're much better off enabling bison's trace facility. At one time, it was somewhat common for lexical analysers to use the yytname table to lookup token numbers for keyword tokens (which needed to be double-quoted in the bison grammar), but that technique is currently being discouraged by the Bison maintainers. However,

Upvotes: 16

Sam Varshavchik
Sam Varshavchik

Reputation: 118292

bison generates an enum called yytokentype that contains an enumerated list of all the tokens in the grammar. It does not provide an equivalent mapping to strings containing all the token names.

So, you'll have to implement this mapping yourself. That is, implementing a utility function that takes a yytokentype parameter, and returns the name of the given token, which you can subsequently use in your diagnostic messages. Another, boring switch farm.

Having said that, it shouldn't be too difficult to write a utility Perl script, or an equivalent, that reads <filename>.tab.h that came out of bison, parses out the yytokentype enumeration, and robo-generates the mapping function. Stick that into your Makefile, with a suitable dependency rule, and you got yourself an automatic robo-generator of a token-to-name mapping function.

Upvotes: 1

Related Questions