Reputation: 153
So I'm working on a combined grammar in ANTLR4 using ANTLRWorks 2.1. I have the lexer rules Identifier
and Block
that are not being recognized as defined lexer rules, but only in the last parser rule defined. Adding a literal after these rules removes (or hides) these errors.
My grammar with the error at the end (italicized tokens are throwing the error):
grammar GCombined;
options { language = Cpp; }
@lexer::namespace{AntlrTest01}
@parser::namespace{AntlrTest01}
/* First Lexer Stage */
Bit: '0' | '1';
Digit : '0'..'9';
ODigit: '0'..'7';
XDigit: '0'..'f';
Letter: ('a'..'z') | ('A'..'Z');
Symbol: '|'
| '-'
| '!'
| '#'
| '$'
| '%'
| '&'
| '('
| ')'
| '*'
| '+'
| ','
| '-'
| '.'
| '/'
| ':'
| ';'
| '<'
| '='
| '>'
| '?'
| '@'
| '['
| ']'
| '^'
| '_'
| '`'
| '{'
| '|'
| '}'
| '~';
WSpace: ( ' '
| '\t'
| '\r'
| '\n'
| '\c'
| '\0'
| '\u000C'
)+ -> skip;
DNumber: Digit+;
ONumber: '0o' Digit+;
XNumber: '0x' Digit;
Integer: DNumber
| ONumber
| XNumber;
Float: DNumber '.' DNumber;
Character: Letter
| Digit
| Symbol
| WSpace;
String: Character+;
Literal: '"' String '"';
Boolean: 'true' | 'false';
/* Second Lexer Stage */
Number: Integer | Float;
Identifier: Letter (Letter | Digit | '_')+;
Keyword: Letter+;
Operator: '+'
| '-'
| '*'
| '/'
| '%'
| '=='
| '!='
| '>'
| '<'
| '>='
| '<='
| '&&'
| '||'
| '^'
| '&'
| '|'
| '<<'
| '>>'
| '~' ;
Expression: (Operator | Identifier)
'(' (Identifier | Number)+ ')';
Parameter: Identifier
| Expression
| Number;
Statement: Keyword '(' Parameter+ ')';
Block: '{' Statement+ '}';
/* Third Lexer Stage */
Add: '+';
Sub: '-';
Mlt: '*';
Div: '/';
Mod: '%';
Mathop: Add | Sub | Mlt | Div | Mod;
Deq: '==';
Neq: '!=';
Gtr: '>';
Lss: '<';
Geq: '>=';
Leq: '<=';
Condop: Deq | Neq | Gtr | Lss | Geq | Leq;
And: '&&';
Or: '||';
Xor: '^';
Bnd: '&';
Bor: '|';
Logop: And | Or | Xor | Bnd | Bor;
Neg: '!';
Boc: '~';
Negop: Neg | Boc;
Asl: '<<';
Asr: '>>';
Shftop: Asl | Asr;
Eql: '=';
Inc: '++';
Dec: '--';
Incop: Inc | Dec;
Peq: '+=';
Meq: '-=';
Teq: '*=';
Seq: '/=';
Req: '%=';
Casop: Peq | Meq | Teq | Seq | Req;
Lparen: '(';
Rparen: ')';
Lbrack: '[';
Rbrack: ']';
Lbrace: '{';
Rbrace: '}';
Point : '.';
Colon : ':';
Numvar: Number
| Identifier
| Mathop '(' Parameter+ ')';
Boolvar: Boolean
| Identifier
| Condop '(' Parameter+ ')'
| Logop '(' Parameter+ ')';
Metaxpr: (Identifier | Operator ) '(' Parameter+ ')';
/* First Parser Stage */
//expressions
add: '+' '(' Numvar+ ')';
sub: '-' '(' Numvar+ ')';
mlt: '*' '(' Numvar+ ')';
div: '/' '(' Numvar+ ')';
mod: '%' '(' Integer+ ')';
mathexpr: add
| sub
| mlt
| div
| mod;
eql: '==' '(' Parameter+ ')';
neq: '!=' '(' Parameter+ ')';
gtr: '>' '(' Parameter+ ')';
les: '<' '(' Parameter+ ')';
geq: '>=' '(' Parameter+ ')';
leq: '<=' '(' Parameter+ ')';
condexpr: eql
| neq
| gtr
| les
| geq
| leq;
and: '&&' '(' Parameter+ ')';
or : '||' '(' Parameter+ ')';
xor: '^' '(' Parameter+ ')';
bnd: '&' '(' Parameter+ ')';
bor: '|' '(' Parameter+ ')';
logexpr: and
| or
| xor
| bnd
| bor;
asl: '<<' '(' Parameter Numvar ')';
asr: '>>' '(' Parameter Numvar ')';
shiftexpr: asl | asr;
neg: '!' '(' Parameter ')';
boc: '~' '(' Parameter ')';
negexpr: neg
| boc;
arrexpr: Identifier '[' Numvar ']';
//instruction forms
vardec: 'def' '(' Identifier+ ')' ': ' Identifier ;
lindec: Identifier '(' Identifier ')';
assign: '=' '(' (Identifier | lindec) Parameter ')';
incstmt: (Inc | Dec) '(' Identifier ')'
| Casop '(' Identifier Identifier ')';
cond: 'if' '(' Boolvar ')' Block
('else if' '(' Boolvar ')' Block)?
('else' Block)?;
loop: (
('while' '(' (condexpr | negexpr) ')')
| ('for' '(' assign ',' (condexpr | negexpr) ',' incstmt')')
) Block;
fundef: 'func' '(' Identifier Parameter+ ')' ': ' Identifier Block;
prodef: 'proc' '(' Identifier Parameter* ')' Block;
call: Identifier '(' Parameter+ ')';
excHandler: 'try' Block
'catch' '(' Identifier ')' Block
('finally' Block)?;
classdef: 'class' '(' Identifier ')' (': ' _Identifier_)? _Block_;
Upvotes: 2
Views: 493
Reputation: 5991
ANTLR requires unambiguous grammar rules. In the provided grammar, the Symbol
rule conflicts with the Operator
rule and others. The Identifier
and Letter
rules conflict. Rules conflict when they can match the same input (content & length).
Also, for example, the Symbol
rule includes '{'
as an alt. Subsequent rules that use the literal '{'
(which is an implicit token type) in any of their alts will not match because the implicit token type is not the same as the Symbol
token type. Best practice is to avoid redundant use of literals - define the literal in a rule, and then just reference that rule.
Best advice would be to buy a copy of TDAR to learn Antlr4.
Upvotes: 2