szydan
szydan

Reputation: 2596

Antlr grammar for parsing simple expression

I would like to parse following expresion with antlr4

termspannear ( xxx, xxx , 5 , true ) 

termspannear ( xxx, termspannear ( xxx, xxx , 5 , true ) , 5 , true ) 

Where termspannear functions can be nested

Here is my grammar:

//Define a gramar to parse TermSpanNear
grammar TermSpanNear;
start       : TERMSPAN ;

TERMSPAN    : TERMSPANNEAR | 'xxx' ;
TERMSPANNEAR: 'termspannear' OPENP BODY CLOSEP ;
BODY        : TERMSPAN COMMA TERMSPAN COMMA SLOP COMMA ORDERED ;
COMMA       : ',' ;
OPENP       : '(' ;
CLOSEP      : ')' ;
SLOP        : [0-9]+ ;
ORDERED     : 'true' | 'false' ;
WS          : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines

After running:

antlr4 TermSpanNear.g4
javac TermSpanNear*.java
grun TermSpanNear start -gui
termspannear ( xxx, xxx , 5 , true )
^D![enter image description here][1]
line 1:0 token recognition error at: 'termspannear '
line 1:13 extraneous input '(' expecting TERMSPAN

and the tree looks like:

enter image description here

Can someone help me with this grammar ? So the parsed tree contains all params and and also nesting works

NOTE: After suggestion by I rewrote it to

//Define a gramar to parse TermSpanNear
grammar TermSpanNear;
start       : termspan EOF;

termspan    : termspannear | 'xxx' ;
termspannear: 'termspannear' '('  body  ')' ;
body        : termspan ',' termspan ',' SLOP ',' ORDERED ;

SLOP        : [0-9]+ ;
ORDERED     : 'true' | 'false' ;
WS          : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines

I think now it works I'm geting the following trees: For

termspannear ( xxx, xxx , 5 , true ) 

enter image description here

For
termspannear ( xxx, termspannear ( xxx, xxx , 5 , true ) , 5 , true )

enter image description here

Upvotes: 2

Views: 707

Answers (1)

Bart Kiers
Bart Kiers

Reputation: 170278

You're using way too many lexer rules.

When you're defining a token like this:

BODY        : TERMSPAN COMMA TERMSPAN COMMA SLOP COMMA ORDERED ;

then the tokenizer (lexer) will try to create the (single!) token: xxx,xxx,5,true. E.g. it does not allow any space in between it. Lexer rules (the ones starting with a capital) should really be the "atoms" of your language (the smallest parts). Whenever you start creating elements like a body, you glue atoms together in parser rules, not in lexer rules.

Try something like this:

grammar TermSpanNear;

// parser rules (the elements)
start          : termpsan EOF ;
termpsan       : termpsannear | 'xxx' ;
termpsannear   : 'termspannear' OPENP body CLOSEP ;
body           : termpsan COMMA termpsan COMMA SLOP COMMA ORDERED ;

// lexer rules (the atoms)
COMMA          : ',' ;
OPENP          : '(' ;
CLOSEP         : ')' ;
SLOP           : [0-9]+ ;
ORDERED        : 'true' | 'false' ;
WS             : [ \t\r\n]+ -> skip ;

Upvotes: 1

Related Questions