Reputation: 4785
I have a left recursive issue in my Antlr grammar. While I think I understand why there is a problem I am unable to think of a solution. The issue is with the last line for my datatype rule. I have included the entire grammar for you to see:
grammar Test;
options {output=AST;ASTLabelType=CommonTree;}
tokens {FUNCTION; ATTRIBUTES; CHILDREN; COMPOSITE;}
program : function ;
function : ID (OPEN_BRACKET (attribute (COMMA? attribute)*)? CLOSE_BRACKET)? (OPEN_BRACE function* CLOSE_BRACE)? SEMICOLON? -> ^(FUNCTION ID ^(ATTRIBUTES attribute*) ^(CHILDREN function*)) ;
attribute : ID (COLON | EQUALS) datatype -> ^(ID datatype);
datatype : ID -> ^(STRING["id"] ID)
| NUMBER -> ^(STRING["number"] NUMBER)
| STRING -> ^(STRING["string"] STRING)
| BOOLEAN -> ^(STRING["boolean"] BOOLEAN)
| array -> ^(STRING["array"] array)
| lookup -> ^(STRING["lookup"] lookup)
| datatype PLUS datatype -> ^(COMPOSITE datatype datatype) ;
array : OPEN_BOX (datatype (COMMA datatype)*)? CLOSE_BOX -> datatype* ;
lookup : OPEN_BRACE (ID (PERIOD ID)*) CLOSE_BRACE -> ID* ;
NUMBER
: ('+' | '-')? (INTEGER | FLOAT)
;
STRING
: '"' ( ESC_SEQ | ~('\\'|'"') )* '"'
;
BOOLEAN
: 'true' | 'TRUE' | 'false' | 'FALSE'
;
ID : (LETTER|'_') (LETTER | INTEGER |'_')*
;
COMMENT
: '//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;}
| '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}
;
WHITESPACE : (' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;} ;
COLON : ':' ;
SEMICOLON : ';' ;
COMMA : ',' ;
PERIOD : '.' ;
PLUS : '+' ;
EQUALS : '=' ;
OPEN_BRACKET : '(' ;
CLOSE_BRACKET : ')' ;
OPEN_BRACE : '{' ;
CLOSE_BRACE : '}' ;
OPEN_BOX : '[' ;
CLOSE_BOX : ']' ;
fragment
LETTER
: 'a'..'z' | 'A'..'Z'
;
fragment
INTEGER
: '0'..'9'+
;
fragment
FLOAT
: INTEGER+ '.' INTEGER*
;
fragment
ESC_SEQ
: '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
;
I am developing using Antlr works which provides a function to resolve this issue - but unfortunately it does not seem to work :s
Any help would be great.
Thanks.
EDIT:
Here is an example of the language I'm trying to implement / parse
<FunctionName> <OptionalAttributes> <OptionalChildFunctions>
So for example:
ForEach(in:[1,2,3,4,5] as:"i") {
Switch(value:{i}) {
Case(value:3) {
Print(message:"This is the number 3")
}
Default {
Print(message:"This isn't the number 3")
}
}
}
Upvotes: 2
Views: 1507
Reputation: 170148
Okay, this should do the trick:
grammar Test;
/************************************** PARSER **************************************/
program
: function EOF
;
function
: ID (OPEN_PAREN (attribute (COMMA attribute)*)? CLOSE_PAREN)?
(OPEN_BRACE function* CLOSE_BRACE)?
SEMICOLON?
;
attribute
: ID (COLON | EQUALS)? expression
;
expression
: atom (PLUS atom)*
;
atom
: ID
| STRING
| BOOLEAN
| NUMBER
| array
| lookup
;
array
: OPEN_BOX (expression (COMMA expression)*)? CLOSE_BOX
;
lookup
: OPEN_BRACE (ID (PERIOD ID)*) CLOSE_BRACE
;
/************************************** LEXER **************************************/
NUMBER : ('+' | '-')? (INTEGER | FLOAT)
;
STRING : '"' ( ESC_SEQ | ~('\\'|'"') )* '"'
;
BOOLEAN : 'true' | 'TRUE' | 'false' | 'FALSE'
;
ID : (LETTER|'_') (LETTER | INTEGER |'_')*
;
COMMENT : '//' ~('\n'|'\r')* ('\r'? '\n'| EOF) {$channel=HIDDEN;}
| '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}
;
WHITESPACE : (' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;} ;
COLON : ':' ;
SEMICOLON : ';' ;
COMMA : ',' ;
PERIOD : '.' ;
PLUS : '+' ;
EQUALS : '=' ;
OPEN_PAREN : '(' ;
CLOSE_PAREN : ')' ;
OPEN_BRACE : '{' ;
CLOSE_BRACE : '}' ;
OPEN_BOX : '[' ;
CLOSE_BOX : ']' ;
fragment
LETTER : 'a'..'z' | 'A'..'Z' ;
fragment
INTEGER : '0'..'9'+ ;
fragment
FLOAT : INTEGER+ '.' INTEGER* ;
fragment
ESC_SEQ : '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\') ;
Note that I've changed the name of OPEN_BRACKET
and CLOSE_BRACKET
into OPEN_PAREN
and CLOSE_PAREN
. The round ones, (
and )
, are parenthesis, the square ones, [
and ]
, are called brackets (the ones you called boxes, but calling them boxes doesn't hurt IMO).
Upvotes: 2