Daniel Walker
Daniel Walker

Reputation: 6770

Recognize specific grammatical mistakes with Bison

I'm attempting to use Bison to develop my own programming language. I've got the .y file written for my grammar. However, I'm wondering if there's a way, in the case that the user attempts to parse source code with invalid grammar, to have Bison give a useful error message. For example, suppose I have the following rule in my grammar:

if_statement: IF expr '{' statement_list '}' {$$=createNode(IF,$2,$4);}
    ;

Suppose the source code left out the closing brace. According to my understanding, Bison would report that it was unable to find a rule to reduce the code. Could Bison be made to recognize that there is an unfinished if which begins on line such-and-such and report that to the user?

Upvotes: 0

Views: 182

Answers (1)

rici
rici

Reputation: 241901

Missing braces are very rarely detected where they happen, because it is usually the case that whatever follows the missing brace could just as well have come before it. That's particularly clear if the missing close brace is immediately followed by another closing brace, but it could simply be followed (in this case) by another statement:

function some_function() {
    ....
    while (some_condition) {
        ...
        if (some_other_condition) {
            ...
            break;
//      }          /* Commented out by mistake */
        a = 3;
        ...
    }
    return a;
}

function another_function() {
    ...
}

If your language doesn't allow nested function definitions then the definition of another_function will trigger an error; if it does allow nested function definitions, then another_function will just be defined in an unexpected scope and the parse will continue, perhaps until the end of file.

One way of detecting errors like this is to check indentation of every line with the expected indentation. However, unless your language has some concept of correct indentation (like, for example, Python), you cannot flag misleading indentation as an error. So the best you can do is record the unexpected indentation, in order to use it as a clue when a syntax error is finally encountered (if there is a syntax error, since it might just be that the programmer doesn't care to make their programmes human-readable). The complications in this approach to error detection are probably why it is so uncommon in mainstream languages, although personally I think it's an approach with a lot of potential.

I usually advocate parsing erroneous programs twice. The first parse is optimised for correct programs, which means that it doesn't need any of the overhead required for good error messages, such as tracking the position of every token. If the program turns out to be syntactically correct, you can then move on to turning the AST into compiled code. If the program turns out to have an syntax error, you can restart the parse at the beginning, and then you are certainly free to use heuristics like indentation checks to attempt to better localise errors.

Having said all that, you may well do better to move on to implementation of your language and return to the problem of producing better diagnostics later.

Bison does offer a mechanism for producing more useful error messages in some cases.

First, you should at least enable line number tracking from Flex, which is almost zero effort. You might also want to track precise token position, which is a bit more work but not too much. (See Character Position from starting of a line, https://stackoverflow.com/a/48879103/1566221 and yyllocp->first_line returns uninitialized value in second iteration of a reEntrant Bison parser (among others) for sample code.)

Second, ask bison to produce verbose error messages. That only requires two extra lines in your bison prologue:

%define parse.error verbose
%define parse.lac full

Please do read the bison manual for some important caveats. In particular, LAC may involve significant overhead. But the error messages produced are often helpful.

Finally, use bison's error recovery mechanism to continue the parse after the first syntax error is detected, thus allowing you to report several syntax errors in a single run. That's usually less frustrating for a user, although you should terminate the parse at some threshold error count, because really high error counts after error recovery usually mean that the error recovery itself failed and that many of the subsequent error messages were bogus.

Again, the bison manual has some useful suggestions about how to use the error facilities.

Bison manual table of contents

Upvotes: 1

Related Questions