Reputation: 2834
I've created a small grammar in ANTLR using python (a grammar that can accept either a list of numbers of a list of IDs), and yet when I input a string such as December 12 1965
, ANTLR will run on the file and show me no errors with the following code (and all of the python code that I'm using is imbedded via the @main):
grammar ParserLang;
options {
language=Python;
}
@header {
import sys
import antlr3
from ParserLangLexer import ParserLangLexer
}
@main {
def main(argv, otherArg=None):
char_stream = antlr3.ANTLRInputStream(open(sys.argv[1],'r'))
lexer = ParserLangLexer(char_stream)
tokens = CommonTokenStream(lexer)
parser = ParserLangParser(tokens);
rule = parser.entry_rule()
}
program : idList EOF
| integerList EOF
;
idList : ID whitespace idList
| ID
;
integerList : INTEGER whitespace integerList
| INTEGER
;
whitespace : (WHITESPACE | COMMENT) +;
ID : LETTER (DIGIT | LETTER)*;
INTEGER : (NONZERO_DIGIT DIGIT*) | ZERO ;
WHITESPACE : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ { $channel = HIDDEN; } ;
COMMENT : ('/*' .* '*/') | ('//' .* '\n') { $channel = HIDDEN; } ;
fragment ZERO : '0' ;
fragment DIGIT : '0' .. '9';
fragment NONZERO_DIGIT : '1' .. '9';
fragment LETTER : 'a' .. 'z' | 'A' .. 'Z';
Am I doing something wrong?
EDIT: When I use ANTLRWorks with the same grammar an input, a NoViableAltException is thrown. How do I get that error via code?
Upvotes: 1
Views: 1113
Reputation: 170128
I could not reproduce it. When I generate a lexer and parser from your input after fixing the error in the grammar (rule = parser.entry_rule()
should be: rule = parser.program()
), and parse the input "December 12 1965"
(either as input from a file, or as a plain string), I get the following error:
line 1:0 no viable alternative at input u'December'
Which may seem strange since that could be the start of a idList
. The fact is, your grammar contains one more error and a small thing that could be improved:
WHITESPACE
and COMMENT
are placed on the HIDDEN
channel, and are therefor not available in parser rules (at least, not without changing the channel from which the parser reads its tokens...);COMMENT
at the end of the input, that is, without a \n
at the end, will not be properly tokenized. Better define a single line comment like this: '//' ~('\r' | '\n')*
. The trailing line break will be captured by the WHITESPACE
rule after all.Because the parser cannot match an idList
(or a integerList
for that matter) because of the whitespace
rule, an error is produced pointing at the very first token ('December'
).
Here's a grammar that works (as expected):
grammar ParserLang;
options {
language=Python;
}
@header {
import sys
import antlr3
from ParserLangLexer import ParserLangLexer
}
@main {
def main(argv, otherArg=None):
lexer = ParserLangLexer(antlr3.ANTLRStringStream('December 12 1965'))
parser = ParserLangParser(CommonTokenStream(lexer))
parser.program()
}
program : idList EOF
| integerList EOF
;
idList : ID+
;
integerList : INTEGER+
;
ID : LETTER (DIGIT | LETTER)*;
INTEGER : (NONZERO_DIGIT DIGIT*) | ZERO ;
WHITESPACE : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ { $channel = HIDDEN; } ;
COMMENT : ('/*' .* '*/' | '//' ~('\r' | '\n')*) { $channel = HIDDEN; } ;
fragment ZERO : '0' ;
fragment DIGIT : '0' .. '9';
fragment NONZERO_DIGIT : '1' .. '9';
fragment LETTER : 'a' .. 'z' | 'A' .. 'Z';
Running the parser generated from the grammar above will also produce an error:
line 1:9 missing EOF at u'12'
but that is expected: after an idList
, the parser expects the EOF
, but it encounters '12'
instead.
Upvotes: 2