Reputation: 47
What's the meaning of token 'error'? how to detect error without ;
Upvotes: 1
Views: 1109
Reputation: 126175
One common source of confusion -- the error
token is for error recovery, not error detection. Syntax errors are detected and reported automatically by the parser. You can detect other errors in the actions and tell bison about it by using the YYERROR
macro.
Conceptually, the error
token replaces a sequence of zero or more input tokens, in an attempt to convert an invalid input stream into a valid one. When an error occurs, the bison generated parser goes into error recovery mode, discarding tokens and states until it gets to a point where the error
pseudo-token can be shifted. It then shifts the error token and attempts to continue from there.
Upvotes: 2
Reputation: 241671
After the error
pseudo-terminal is matched, the bison parser continues to parse in the normal way, except that it discards tokens which "cannot be handled".
If it encounters a token which immediately follows the error
token, it can shift that token, which means that it will stop discarding tokens.
However, that is not the only way the parser can handle a token. It could also handle it by doing a reduction.
Here, the word "handled" is interpreted a bit loosely, since a reduction action does not actually accept the lookahead token. Nonetheless, it is sufficient for the error production to be reduced.
In such a case, care must be taken to not call yyerrok
. If error handling is cancelled with yyerrok
and the lookahead token cannot be shifted, then the error handler will be reentered and it is possible to fall into an endless loop.
For example,
commands: %empty | commands command
command : exp ';' { printf("Value is %d\n", $1); }
| error ';' { printf("Bad expression\n"); yyerrok; }
| error { printf("Missing semicolon\n"); }
The first command
production causes the result of a correct expression to be printed out. The second production deals with syntax errors where there is still a semicolon. It can cancel error handling because the ;
has already been shifted so it is ok to restart error-handling.
The third production deals with a missing semicolon. Here, we cannot call yyerrok
because it is possible that the lookahead token is an illegal token, such as !. If we were to call yyerrok
, the error status would be cleared, and error-handling would be immediately reentered with the same exclamation mark as the lookahead token, causing an endless loop. But
without yyerrok
, the parser is still in error-handling mode and the offending token will be discarded.
Note: The above was intended to help answer the question of what would be the effect of an error
production with nothing following the error
token. It was not intended to answer any question not being asked, such as "How do I do X ?" (For various values of X). The provided example is a bit artificial. The original used a newline character as the expression terminator, and it was not necessary to include the second error-handling production since it is effectively Impossible to leave out a terminating newline except at EOF.
Upvotes: 2