Reputation: 211
I am implementing an extended λ-calculus interpreter using ANTLR4 and it's C++ target. Here is the language grammar:
grammar lambda;
program: expression|;
expression:
(Int | Bool) # literal
| Identifier # variable
| expression expression # application
| Lambda Identifier '.' expression # abstraction
| Identifier '=' expression # assign
| condition # conditional
| Operator expression expression # binaryExpression
| 'print' expression # printInstruction
| '(' expression ')' # brackets;
body: expression;
condition: 'if' expression 'then' body 'else' body
| '(' expression '->' body '|' body;
Lambda: '\\' | 'λ';
Bool : 'tru' | 'fls' | 'true' | 'false';
Int: [0-9]+;
Identifier: ('a' ..'z') ('a' ..'z' | '0' ..'9')*;
Operator:
'+'
| '-'
| '*'
| '/'
| '<'
| '>'
| '<='
| '>='
| '==';
WS: [ \n\t\r]+ -> skip;
I am constructing an AST using the visitor model, which will be evaluated separately. I have encountered an issue with the way ANTLR is parsing the input, and I'm not really sure even what to call it.
// incorrect_association.lambda
y = 1
x = 1
Assignment ( y = ( Application ( Literal ( 1 ) ) ( Assignment ( x = ( Literal ( 1 ) ) ) ) ) )
The AST should be
Assignment ( y = ( Literal ( 1 ) )
Assignment ( x = ( Literal ( 1 ) )
or
Grouping (
Assignment ( y = ( Literal ( 1 ) ),
Assignment ( x = ( Literal ( 1 ) )
)
I suppose this may tie into the first issue: expressions across multiple lines are being read as Application
expressions.
// incorrect_application.lambda
x = 1
print x
Assignment ( x = ( Application ( Literal ( 1 ) ) ( PrintInstruction ( Identifier ( "x" ) ) ) ) )
The AST should be
Assignment ( x = ( Literal ( 1 ) )
PrintInstruction ( Identifier ( "x" ) )
or
Grouping (
Assignment ( x = ( Literal ( 1 ) ),
PrintInstruction ( Identifier ( "x" ) )
)
I am trying to have imperative-like constant assignments, with a functional-like execution. Eventually, the program should just be whatever main = ...
(like Haskell). Is it possible to prevent the Application
rule from matching two expressions that are on different lines, but continue to allow any other whitespace and parens?
I was contemplating writing a preprocessor that would just throw semicolons on each line ending. I may need to do this anyway because I plan on adding
imports: 'import' Identifier | '(' imports ')';
as a grammar rule, and haven't found a nice solution for handling imports with ANTLR. If I were to go this route, how would I include ;
line endings in my grammar?
PS: I am very new to ANTLR, so any guidance would be super helpful.
Upvotes: 1
Views: 102
Reputation: 241681
If you want newlines to be significant, then let them pass through the lexical scanner.
WS: [ \t\r]+ -> skip;
NL: [\n];
Then you could define a program as a sequence of expressions terminated with newline characters:
program: ( expression NL )*;
If you want semicolons to work as well, just change the definition of NL:
NL: [\n;];
You'll also want to change body
to accept multiple expressions, although it's not clear to me what kind of punctuation you will want to use. It's possible that
body: expression (NL expression)*;
will work for you, but it might produce unexpected results.
Your application syntax is highly ambiguous. I have no idea what Antlr will make of it, but I can't interpret it. If you have
+ a b c
That must be one of:
(+ a b) (c)
(+ a (b c))
(+ (a b) c)
But I don't see any indication of which of the three should be prefered. I would think that you would need to come up with a grammar which has more precise precedence.
(There is a reason Lisp and Scheme use parentheses :-) )
Upvotes: 1