Chris Covert
Chris Covert

Reputation: 2834

Why does my antlr grammar seem to properly parse this input?

I've created a small grammar in ANTLR using python (a grammar that can accept either a list of numbers of a list of IDs), and yet when I input a string such as December 12 1965, ANTLR will run on the file and show me no errors with the following code (and all of the python code that I'm using is imbedded via the @main):

grammar ParserLang;

options {
    language=Python;
}

@header {
import sys
import antlr3

from ParserLangLexer import ParserLangLexer
}

@main { 
def main(argv, otherArg=None):
    char_stream = antlr3.ANTLRInputStream(open(sys.argv[1],'r'))
    lexer = ParserLangLexer(char_stream)

    tokens = CommonTokenStream(lexer)
    parser = ParserLangParser(tokens);

    rule   = parser.entry_rule()
}

program     : idList EOF
            | integerList EOF
            ;

idList      : ID whitespace idList 
            | ID
            ;

integerList : INTEGER whitespace integerList 
            | INTEGER
            ;

whitespace  : (WHITESPACE | COMMENT) +;

ID            : LETTER (DIGIT | LETTER)*;
INTEGER       : (NONZERO_DIGIT DIGIT*) | ZERO ;
WHITESPACE    : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+    { $channel = HIDDEN; } ;
COMMENT       : ('/*' .* '*/') | ('//' .* '\n') { $channel = HIDDEN; } ;

fragment ZERO            : '0' ;
fragment DIGIT         : '0' .. '9';
fragment NONZERO_DIGIT : '1' .. '9';
fragment LETTER        : 'a' .. 'z' | 'A' .. 'Z';

Am I doing something wrong?

EDIT: When I use ANTLRWorks with the same grammar an input, a NoViableAltException is thrown. How do I get that error via code?

Upvotes: 1

Views: 1113

Answers (1)

Bart Kiers
Bart Kiers

Reputation: 170128

I could not reproduce it. When I generate a lexer and parser from your input after fixing the error in the grammar (rule = parser.entry_rule() should be: rule = parser.program()), and parse the input "December 12 1965" (either as input from a file, or as a plain string), I get the following error:

line 1:0 no viable alternative at input u'December'

Which may seem strange since that could be the start of a idList. The fact is, your grammar contains one more error and a small thing that could be improved:

  • WHITESPACE and COMMENT are placed on the HIDDEN channel, and are therefor not available in parser rules (at least, not without changing the channel from which the parser reads its tokens...);
  • a COMMENT at the end of the input, that is, without a \n at the end, will not be properly tokenized. Better define a single line comment like this: '//' ~('\r' | '\n')*. The trailing line break will be captured by the WHITESPACE rule after all.

Because the parser cannot match an idList (or a integerList for that matter) because of the whitespace rule, an error is produced pointing at the very first token ('December').

Here's a grammar that works (as expected):

grammar ParserLang;

options {
    language=Python;
}

@header {
import sys
import antlr3

from ParserLangLexer import ParserLangLexer
}

@main { 
def main(argv, otherArg=None):
    lexer = ParserLangLexer(antlr3.ANTLRStringStream('December 12 1965'))
    parser = ParserLangParser(CommonTokenStream(lexer))
    parser.program()
}

program     : idList EOF
            | integerList EOF
            ;

idList      : ID+
            ;

integerList : INTEGER+
            ;

ID          : LETTER (DIGIT | LETTER)*;
INTEGER     : (NONZERO_DIGIT DIGIT*) | ZERO ;
WHITESPACE  : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ { $channel = HIDDEN; } ;
COMMENT     : ('/*' .* '*/' | '//' ~('\r' | '\n')*)   { $channel = HIDDEN; } ;

fragment ZERO          : '0' ;
fragment DIGIT         : '0' .. '9';
fragment NONZERO_DIGIT : '1' .. '9';
fragment LETTER        : 'a' .. 'z' | 'A' .. 'Z';

Running the parser generated from the grammar above will also produce an error:

line 1:9 missing EOF at u'12'

but that is expected: after an idList, the parser expects the EOF, but it encounters '12' instead.

Upvotes: 2

Related Questions