Reputation: 20391
For fun, I want to write a parser for the untyped Lambda calculus. The easiest approach is probably write a handwritten parser, but I wonder if there is a more Pythonic way? Specifically, I want to use a Python library that translates the syntax description for the language into a parser. Here is the language's BNF definition:
<term> ::= <var>
| <term> <term>
| λ <var> <term>
For simplicity I've omitted paranthesis rules. Application associates
to the left so that x y z
is (x y) z
.
What Python library can take the above syntax description, or some grammar derived from it (as written, the grammar is, I believe, both ambiguous and left-recursive so it is not trival to implement), and produce a parser? I want to see how it is done using code, so please don't just answer "pyparsing can do it". Please write code along the following lines:
>>> G = """syntax description here..."""
>>> parser = build.the_parser(G)
>>> parser.parse("λ x. (y z)")
Abs('x', App(Id('x', Id('y'))))
The last line is what the produced abstract syntax tree could be. Abs stands for abstraction (lambda), App for application and Id for Identifier. I think a PEG packrat parser generator would work well here.
Upvotes: 3
Views: 846
Reputation: 4132
Here's an alternative that removes the left recursion. Although visiting the syntax tree is an exercise for the OP.
from parsimonious import Grammar
grammar = Grammar(r"""
start = term
term = ('(' _* term1 _* ')') / term1
term1 = app / atom1
atom = ('(' _* atom1 _* ')') / atom1
atom1 = abs / id
abs = 'λ' _* id _* '.' _* term
app = atom _+ term
id = ~"[A-Za-z]+"
_ = ~"[^\S\n\r]"u
""")
print(grammar.parse("x y z w"))
Upvotes: 0
Reputation: 170237
This ANTLR4 grammar does the trick:
grammar T;
program
: term EOF
;
term
: Lambda Id '.' term
| '(' term ')'
| term term
| Id
;
Lambda
: '\u03BB'
;
Id
: [a-z] [a-zA-Z0-9]*
;
Spaces
: [ \t\r\n] -> skip
;
Place the above in a file called T.g4
. Download the ANTLR4 jar into the same folder and do:
java -cp antlr-4.7.2-complete.jar org.antlr.v4.Tool -Dlanguage=Python3 T.g4
This will create the lexer and parser files.
Now run:
from antlr4 import *
from playground.TLexer import TLexer
from playground.TParser import TParser
tests = [
'λ x. (y z)',
'x y z w'
]
for test in tests:
lexer = TLexer(InputStream(test))
parser = TParser(CommonTokenStream(lexer))
tree = parser.program()
print("{}".format(tree.toStringTree(recog=parser)))
which will print:
(program (term λ x . (term ( (term (term y) (term z)) ))) <EOF>)
(program (term (term (term (term x) (term y)) (term z)) (term w)) <EOF>)
Upvotes: 1