Reputation: 493
When I run my grammar (lexer and parser) in powershell, it produces these errors:
line 1:6 no viable alternative at input 'global '
I've tried searching google and SO about the errors and the infinite loop, but I can't really understand anything (probably because I just started antlr a few days ago).
Test file:
global a = 2
Lexer grammar:
lexer grammar Test;
// tokens { SMT, SST, SCH, USM, USS, USC, NMT, NST, NCH, UNM, UNS, NAS, UNC, NAC, MLC, SLC, PXI, DXI, PXF, DXF, PXB, DXB, PXD, DXD, PXP, DXP, PRX, DEX, IMG, FLT, DBL, DCM, PRC, INT, CKW, KWR, RMH, IMH, NAN, IND, UND, NIL, NON, TRU, FLS, IDN, WSP, BKS, SMC, CMA, IVC }
MST: '\'\'' -> pushMode(multiline);
SMT: SPS SDSQ ODQ*? SDSQ;
SST: SPS '"' NTD* '"';
SCH: [rf] SLSQ OQC SLSQ;
USM: SPS SDSQ ODQ*? EOF;
USS: SPS '"' NTD* (EOL* ODC*)? EOF;
USC: [rf] SLSQ OQC? EOF;
NMT: SDSQ ODQ*? SDSQ;
NST: '"' NTD* '"';
NCH: SLSQ OQC SLSQ;
UNM: SDSQ ODQ*? EOF;
UNS: '"' NTD* (EOL* ODC*)? EOF;
NAS: SPS? '"' NTD* EOL* ODC* '"';
UNC: SLSQ OQC? EOF;
NAC: (('p'[f]?|[f]?'p')? SLSQ OQC+ SLSQ? | [rpf]? SLSQ OQC OQC+ SLSQ?);
MLC: '##' NDH* '##'? -> skip;
SLC: '--' NEN* -> skip;
EOL: '\r'? '\n' | [\n\u000b\f\r\u0085\u2028\u2029];
EML: '[]';
EMS: '{}';
PXI: NUM 'E' NUM 'i';
DXI: NUM 'e' NUM 'i';
PXF: NUM 'E' NUM 'f';
DXF: NUM 'e' NUM 'f';
PXB: NUM 'E' NUM 'd';
DXB: NUM 'e' NUM 'd';
PXD: NUM 'E' NUM 'D';
DXD: NUM 'e' NUM 'D';
PXP: NUM 'E' NUM 'p'
| '+'? NUM 'E' NUM
;
DXP: NUM 'e' NUM 'p'
| '+'? NUM 'e' NUM
;
PRX: '-'? NUM 'E' NUM;
DEX: '-'? NUM 'e' NUM;
IMG: NUM? 'i';
FLT: NUM 'f'
| DIGIT+ '.' DIGIT DIGITFQ DIGITTQ
;
DBL: NUM 'd'
| DIGIT+ '.' DIGITE DIGITFQ DIGITGQ
;
DCM: NUM 'D'
| DIGIT+ '.' DIGITE DIGITE DIGITEQ DIGITFQ DIGITGQ
;
PRC: NUM 'p'
| DIGIT+ '.' DIGITE DIGITE DIGITE DIGITE DIGIT*
;
INT: INTEGER | DIGIT;
OPN: 'opn';
OUT: 'out';
OUTF: 'outf';
PRINT: 'print';
PRINTF: 'printf';
LAMBDA: 'lambda';
FUNC: 'func';
ERR: 'err';
ERRF: 'errf';
ASSERT: 'assert';
ASSERTF: 'assertf';
FORMAT: 'format';
SWITCH: 'switch';
ABS: 'abs';
ASCII: 'ascii';
CALLABLE: 'callable';
CHR: 'chr';
DIR: 'dir';
EVAL: 'eval';
EXEC: 'exec';
FILTER: 'filter';
GET: 'get';
HASH: 'hash';
ID: 'id';
INST: 'inst';
SUB: 'sub';
SUPER: 'super';
MAX: 'max';
MIN: 'min';
OBJ: 'obj';
ORD: 'ord';
POWF: 'pow';
REV: 'rev';
REPR: 'repr';
ROUND: 'round';
FLOOR: 'floor';
CEIL: 'ceil';
MUL: 'mul';
SORT: 'sort';
ADD: 'add';
ZIP: 'zip';
WAIT: 'wait';
SECS: 'secs';
MILS: 'mils';
BENCHMARK: 'benchmark';
HAS: 'has';
SIBLING: 'sibling';
A: 'a';
CHILD: 'child';
OF: 'of';
WHILE: 'while';
FOR: 'for';
DO: 'do';
DEL: 'del';
NEW: 'new';
IMPORT: 'import';
EXPORT: 'export';
DEF: 'def';
END: 'end';
GLOBAL: 'global';
BREAK: 'break';
CONTINUE: 'continue';
NOT: 'not';
AND: 'and';
OR: 'or';
IN: 'in';
CASE: 'case';
DEFAULT: 'default';
RETURN: 'return';
TRY: 'try';
EXCEPT: 'except';
FINALLY: 'finally';
ELIF: 'elif';
IF: 'if';
ELSE: 'else';
AS: 'as';
STRT: 'str';
INTT: 'int';
NUMT: 'num';
DECIMALT: 'decimal';
FLOATT: 'float';
DOUBLET: 'double';
PRECISET: 'precise';
EXPNT: 'expn';
CONST: 'const';
REPEAT: 'repeat';
UNTIL: 'until';
THEN: 'then';
CHART: 'char';
GOTO: 'goto';
LABEL: 'label';
TYPET: 'type';
USING: 'using';
BOOLT: 'bool';
PUBLIC: 'public';
PROTECTED: 'protected';
PRIVATE: 'private';
CLASSD: 'class';
SELF: 'self';
STRUCTD: 'struct';
FROM: 'from';
XOR: 'xor';
LISTD: 'list';
TUPLED: 'tuple';
DICTD: 'dict';
SETD: 'set';
IMAGT: 'imag';
REALT: 'real';
WHERE: 'where';
PASS: 'pass';
G_G: '_G';
L_L: '_L';
HEXT: 'hex';
BINT: 'bin';
OCTT: 'oct';
MAP: 'map';
ANYT: 'any';
VOIDT: 'void';
IS: 'is';
A_FDV: '//=';
A_CDV: '*/=';
A_NOR: '||=';
A_FAC: '=!=';
A_LTE: '=<=';
A_GTE: '=>=';
A_EQL: '===';
A_NEQ: '!==';
A_CON: '..=';
A_NXR: '$$=';
A_BRS: '>>=';
A_NND: '&&=';
A_BLS: '<<=';
A_DCL: '::=';
A_CLD: ':.=';
A_KUN: '=**';
A_VUN: '=*';
A_DOT: '.=';
A_POW: '^=';
A_NOT: '=!';
A_BNT: '=~';
A_LEN: '=#';
A_PER: '=%';
A_MUL: '*=';
A_DIV: '/=';
A_MOD: '%=';
A_ADD: '+=';
A_SUB: '-=';
A_LET: '=<';
A_GRT: '=>';
A_BND: '&=';
A_BXR: '$=';
A_BOR: '|=';
A_TND: '?=';
A_TOR: ':=';
A_NML: '=';
NND: '&&';
NXR: '$$';
NOR: '||';
CLP: '()';
SUP: '::';
SIB: ':.';
KUN: '**';
INC: '++';
DEC: '+-';
FDV: '//';
CDV: '*/';
CON: '..';
BLS: '<<';
BRS: '>>';
LTE: '<=';
GTE: '>=';
EQL: '==';
NEQ: '!=';
LPR: '(';
RPR: ')';
LBR: '[';
RBR: ']';
LBC: '{';
RBC: '}';
STR: '*';
POW: '^';
PLS: '+';
MNS: '-';
BNT: '~';
EXC: '!';
LEN: '#';
PER: '%';
DIV: '/';
LET: '<';
GRT: '>';
BND: '&';
BXR: '$';
BOR: '|';
TND: '?';
TOR: ':';
DOT: '.';
RMH: 'inf';
IMH: 'inf*i';
NAN: 'nan';
IND: 'ind';
UND: 'und';
NIL: 'nil'
| 'null';
NON: 'none';
TRU: 'true';
FLS: 'false';
IDN: AUL WORD*;
WSP: (('\r'? '\n') | [\u0009\u0020\u00a0\u1680\u2000-\u200a\u202f\u205f\u3000\n\u000b\f\r\u0085\u2028\u2029])+;
BKS: [\u007f\u0008]+;
SMC: ';';
CMA: ',';
SPC: ' ';
IVC: .;
mode multiline;
SCE: ESCAPE -> type(SCN);
SND: (SDSQ | EOF) -> popMode;
SCN: '\'' ~'\'' | ~'\'';
mode multicom;
CCE: '\\##' -> type(CCN);
CND: ('##' | EOF) -> popMode;
CCN: '#' ~'#' | ~'#';
fragment DIGITE: DIGITF DIGITF;
fragment DIGITEQ: DIGITFQ DIGITFQ;
fragment DIGITF: DIGITG DIGIT;
fragment DIGITFQ: DIGITGQ DIGITQ;
fragment SPS: 'pf' | 'fp' | [rpf];
fragment OCH: ESCAPE | ALL;
fragment OQC: ESCAPE | ~['];
fragment ODQ: ESCAPE | SCN;
fragment ODC: ESCAPE | ~["];
fragment NDN: ESCAPE | NED;
fragment AUL: [a-zA-Z_];
fragment DIGIT: [0-9];
fragment DECM: INTEGER '.' INTEGER;
fragment WORD: AUL | DIGIT;
fragment DIGITQ: DIGIT?;
fragment DIGITG: DIGIT DIGIT DIGIT;
fragment DIGITGQ: DIGIT? DIGIT? DIGIT?;
fragment DIGITT: DIGIT DIGIT;
fragment DIGITTQ: DIGIT? DIGIT?;
fragment INTEGER: DIGITG ('_'? DIGITG)* | DIGIT+;
fragment ALL: .;
fragment TNQ: '\\' SLSQ | ~['];
fragment TNDQQ: '\\"' | ~["];
fragment NDH: CCN;
fragment NEN: ~[\n\u000b\f\r\u0085\u2028\u2029];
fragment NED: ~[\n\u000b\f\r\u0085\u2028\u2029"];
fragment ENN: ESCAPE | NEN;
fragment NTD: ESCAPE | NED;
fragment HEX: DIGIT | [a-fA-F];
fragment HEXQ: HEX?;
fragment SPCF: ' ';
fragment ESCAPE: '\\' ( 'x' HEX HEXQ | 'u' HEX HEXQ HEXQ HEXQ | 'U' HEX HEXQ HEXQ HEXQ HEXQ HEXQ HEXQ HEXQ | 'X' HEX HEXQ HEXQ HEXQ HEXQ HEXQ HEXQ HEXQ HEXQ HEXQ HEXQ HEXQ HEXQ HEXQ HEXQ HEXQ| 'o' OCT OCTQ OCTQ | 'O' OCT OCTQ OCTQ OCTQ OCTQ OCTQ | ESCS );
fragment ESCS: ['"nrtbfv\\];
fragment OCT: [0-7];
fragment OCTQ: OCT?;
fragment NUM: DECM | INTEGER;
fragment SLDQ: ["];
fragment SLSQ: ['];
fragment SDSQ: SLSQ SLSQ;
Parser grammar:
parser grammar TestP;
options { tokenVocab=Test; }
program: line (EOL line*) EOL? EOF;
line: expression+ SMC* | expression* SMC+;
expression: wsp* assign_expression wsp*;
assign_expression: <assoc=right> GLOBAL var_list wsp* aop wsp* (var_list wsp* aop wsp*)* or_expression (wsp* CMA wsp* or_expression)*;
or_expression: xor_expression (wsp* (NOR | OR) wsp* xor_expression)*;
xor_expression: and_expression (wsp* (NXR | XOR) wsp* and_expression)*;
and_expression: ternary (wsp* (NND | AND) wsp* ternary)*;
ternary: <assoc=right> bit_or (wsp* (AND | TND) wsp* bit_or wsp* (OR | TOR) wsp* bit_or)?;
bit_or: bit_xor (wsp* BOR wsp* bit_xor)*;
bit_xor: bit_and (wsp* BXR wsp* bit_and)*;
bit_and: eql_expr (wsp* BND wsp* eql_expr)*;
eql_expr: com_expr (wsp* (EQL | IS | NEQ | IS wsp* NOT) wsp* com_expr)*;
com_expr: shift_expr (wsp* (LET | LTE | GRT | GTE) wsp* shift_expr)*;
shift_expr: con_expr (wsp* (BLS | BRS) wsp* con_expr)*;
con_expr: one_expr (wsp* CON wsp* one_expr)*;
one_expr: perc_expr (wsp* (PLS | MNS) wsp* perc_expr)*;
perc_expr: fac_expr ((wsp* (STR | DIV | PER | FDV | CDV) wsp* fac_expr)+ | PER)*;
fac_expr: unary_expr EXC?;
unary_expr: (PLS | MNS | BNT | EXC | LEN | NOT)* crement_expr;
crement_expr: pow_expr (wsp* (INC|DEC) wsp* pow_expr | INC | DEC)*;
pow_expr: <assoc=right> unpack_expr (wsp* POW wsp* unpack_expr)*;
unpack_expr: (STR | KUN)* brackets_expr;
brackets_expr
: tupled
| LPR access_expr RPR
| dictd
| setd
| LBC access_expr RBC
| listd
| LBR access_expr RBR
| access_expr
;
var_list: assign ( wsp* CMA wsp* assign)*;
assign
: idn
| index
;
access_expr
: atom ((DOT | SUP | SIB) atom)*
;
atom
: litidn
|datat
| val
| num
| index
;
literal
: strt
| num
;
datat
: listd
| dictd
| setd
| tupled
;
listd
: LBR wsp* comma_separated wsp* RBR
| EML
;
dictd
: LBC wsp* kvpair wsp*
(
CMA
wsp*
kvpair
wsp*
)+
RBC
| LBC wsp* kvpair wsp* RBC
;
setd
: LBC wsp* comma_separated wsp* RBC
| EMS
;
tupled: LPR wsp* comma_separated wsp* RPR;
comma_separated
: expression
(
wsp*
CMA
wsp*
expression
)*
;
kvpair: expression wsp* TOR SPC wsp* expression;
litidn
: literal
| idn
;
typecast: typet LPR expression RPR;
call:
(
dictd
| idn
| index
)
(
CLP
|
tupled
)
;
strt
: multi_line
| single_line
| char_string
;
idn
: IDN
| A
;
index: (datat | strt) LBR expression RBR;
multi_line
: SMT
| USM
| NMT
| UNM
;
single_line
: SST
| USS
| NST
| UNS
| NAS
;
char_string
: SCH
| USC
| NCH
| UNC
| NAC
;
num
: exponential
| non_exponential
;
exponential
: PXI
| DXI
| PXF
| DXF
| PXB
| DXB
| PXD
| DXD
| PXP
| DXP
| PRX
| DEX
;
non_exponential
: IMG
| FLT
| DBL
| DCM
| PRC
| INT
;
typet
: STRT
| INTT
| NUMT
| DECIMALT
| FLOATT
| DOUBLET
| PRECISET
| EXPNT
| CHART
| IMAGT
| REALT
| HEXT
| BINT
| OCTT
| LISTD
| SETD
| DICTD
| TUPLED
| TYPET
| BOOLT
;
wsp
: WSP
| EOL
;
bks_or_wsp
: wsp
| BKS
| SPC
;
emd
: EML
| EMS
;
sep
: SMC
| CMA
| TOR
;
/*
kwr
: WHILE
| FOR
| DO
| DEL
| NEW
| IMPORT
| EXPORT
| DEF
| END
| GLOBAL
| BREAK
| CONTINUE
| NOT
| AND
| OR
| IN
| CASE
| DEFAULT
| RETURN
| TRY
| EXCEPT
| FINALLY
| ELIF
| IF
| ELSE
| AS
| CONST
| REPEAT
| UNTIL
| THEN
| GOTO
| LABEL
| USING
| PUBLIC
| PROTECTED
| PRIVATE
| SELF
| FROM
| XOR
| IMAGT
| REALT
| WHERE
| PASS
| G_G
| L_L
| MAP
| IS
;
ckw
: OPN
| OUT
| OUTF
| PRINT
| PRINTF
| LAMBDA
| FUNC
| ERR
| ERRF
| ASSERT
| ASSERTF
| FORMAT
| SWITCH
| ABS
| ASCII
| CALLABLE
| CHR
| DIR
| EVAL
| EXEC
| FILTER
| GET
| HASH
| ID
| INST
| SUB
| SUPER
| MAX
| MIN
| OBJ
| ORD
| POWF
| REV
| REPR
| ROUND
| FLOOR
| CEIL
| MUL
| SORT
| ADD
| ZIP
| WAIT
| SECS
| MILS
| BENCHMARK
;
*/
val
: RMH // 'inf'
| IMH // 'inf*i'
| NAN // 'nan'
| IND // 'ind'
| UND // 'und'
| NIL // 'nil'
| NON // 'none'
| TRU // 'true'
| FLS // 'false'
;
aop
: A_FDV // '//='
| A_CDV // '*/='
| A_NOR // '||='
| A_FAC // '=!='
| A_LTE // '=<='
| A_GTE // '=>='
| A_EQL // '==='
| A_NEQ // '!=='
| A_CON // '..='
| A_NXR // '$$='
| A_BRS // '>>='
| A_NND // '&&='
| A_BLS // '<<='
| A_DCL // '::='
| A_CLD // ':.='
| A_KUN // '=**'
| A_VUN // '=*'
| A_DOT // '.='
| A_POW // '^='
| A_NOT // '=!'
| A_BNT // '=~'
| A_LEN // '=#'
| A_PER // '=%'
| A_MUL // '*='
| A_DIV // '/='
| A_MOD // '%='
| A_ADD // '+='
| A_SUB // '-='
| A_LET // '=<'
| A_GRT // '=>'
| A_BND // '&='
| A_BXR // '$='
| A_BOR // '|='
| A_TND // '?='
| A_TOR // ':='
| A_NML // '='
;
opr
: NND // '&&'
| NXR // '$$'
| NOR // '||'
| CLP // '()'
| SUP // '::'
| SIB // ':.'
| KUN // '**'
| INC // '++'
| DEC // '+-'
| FDV // '//'
| CDV // '*/'
| CON // '..'
| BLS // '<<'
| BRS // '>>'
| LTE // '<='
| GTE // '>='
| EQL // '=='
| NEQ // '!='
| LPR // '('
| RPR // ')'
| LBR // '['
| RBR // ']'
| LBC // '{'
| RBC // '}'
| STR // '*'
| POW // '^'
| PLS // '+'
| MNS // '-'
| BNT // '~'
| EXC // '!'
| LEN // '#'
| PER // '%'
| DIV // '/'
| LET // '<'
| GRT // '>'
| BND // '&'
| BXR // '$'
| BOR // '|'
| TND // '?'
| TOR // ':'
| DOT // '.'
;
/*
inl
: strt
| num
| ckw
| kwr
| val
| idn
| bks_or_wsp
| sep
| emd
| aop
| opr
| typet
| IVC
;
*/
EDIT: Changed parser grammar, changed errors
Upvotes: 0
Views: 147
Reputation: 6785
There's a lot of "Stuff" in this Lexer/Parser. That's not a bad thing for a completed grammar, but I'd suggest that you "start small". (There are a lot of "red flags" that you're not really "getting" ANTLR and how it works.)
An example: Your kwr
parserRule appears to list the keywords from your grammar. It's quite unlikely that there's any place where you need to match "any keyword"
@momolechart is right that you don't have a path from your program
rule to any rule that utilizes the GLOBAL
tokens.
Let's start small, for just what you need for this expression:
You have:
GLOBAL
keyword (token),A
token (?? this looks suspiciously like it should be an identifier, but, I'll take at face value, that 'a' has a particular meaning in your context
since you defined a token rule for it.A_NML
token (value '=') (this just gets stranger and stranger, but I'll go with it)INT
token ('2')WSP
tokens (I'll address how whitespace is normally handled)Let's strip the Lexer down to handle that:
lexer grammar Test;
GLOBAL: 'global';
A: 'a';
INT: INTEGER;
A_NML: '=';
SMC: ';';
EOL: '\r'? '\n' | [\n\u000b\f\r\u0085\u2028\u2029];
WSP: (('\r'? '\n') | [\u0009\u0020\u00a0\u1680\u2000-\u200a\u202f\u205f\u3000\n\u000b\f\r\u0085\u2028\u2029])+ -> skip;
fragment DIGIT: [0-9];
fragment DIGITG: DIGIT DIGIT DIGIT;
fragment INTEGER: DIGITG ('_'? DIGITG)* | DIGIT+;
(I've tried to keep as much of your original Lexer grammar as possible. I did take the liberty of handling whitespace the "normal way, i.e. using -> skip
. In your parser rules, I see a LOT of wsp*
s (this removes that need)
Now we need to recognize the assignment statement. So, a minimal Parser grammar to handle that would be:
parser grammar TestP;
options { tokenVocab=Test; }
program: line (EOL line*) EOL? EOF;
line: expression+ SMC* | expression* SMC+;
expression: assign_expression ;
assign_expression: GLOBAL A A_NML INT;
That was a substantial simplification of your assign_expression
, but now you're "in business"; you're recognizing your sample input. You can start building your grammar out from there.
Your immediate problem (as noted in the other answer) was that no rule referenced (directly, or indirectly) from expression
recognized a GLOBAL
keyword.
Now for longer term advice. These grammars (to me) look like an attempt to do a literal translation of some technical spec (I can't imagine what the language is), into your best guess of how to represent that in ANTLR.
That is often quite ill-advised. I'd suggest backing up a bit and learning ANTLR on some simpler efforts, and, once you have a grasp of ANTLR, then work on tackling this. I'd also advise, if you're going to tackle anything this complex, getting and reading The Definitive ANTLR4 Reference from pragmatic programmers. It will be well worth your time. (It actually calls out this approach to implementing tech specs into ANTLR as not such a good approach.)
Upvotes: 0
Reputation: 162
Both global
and a
are listed in your grammer under kwr
rule.
kwr
is mentioned in the inl
rule which isn't used anywhere. So your parser don't know how to deal with inl
and don't know what to do with two inl
chained together (global a
)
Upvotes: 0