Gaslight Deceive Subvert
Gaslight Deceive Subvert

Reputation: 20391

Python parser for lambda calculus

For fun, I want to write a parser for the untyped Lambda calculus. The easiest approach is probably write a handwritten parser, but I wonder if there is a more Pythonic way? Specifically, I want to use a Python library that translates the syntax description for the language into a parser. Here is the language's BNF definition:

<term> ::= <var>
        |  <term> <term>
        |  λ <var> <term>

For simplicity I've omitted paranthesis rules. Application associates to the left so that x y z is (x y) z.

What Python library can take the above syntax description, or some grammar derived from it (as written, the grammar is, I believe, both ambiguous and left-recursive so it is not trival to implement), and produce a parser? I want to see how it is done using code, so please don't just answer "pyparsing can do it". Please write code along the following lines:

>>> G = """syntax description here..."""
>>> parser = build.the_parser(G)
>>> parser.parse("λ x. (y z)")
Abs('x', App(Id('x', Id('y'))))

The last line is what the produced abstract syntax tree could be. Abs stands for abstraction (lambda), App for application and Id for Identifier. I think a PEG packrat parser generator would work well here.

Upvotes: 3

Views: 846

Answers (2)

Josh Voigts
Josh Voigts

Reputation: 4132

Here's an alternative that removes the left recursion. Although visiting the syntax tree is an exercise for the OP.

from parsimonious import Grammar

grammar = Grammar(r"""
    start = term

    term = ('(' _* term1 _* ')') / term1
    term1 = app / atom1

    atom = ('(' _* atom1 _* ')') / atom1
    atom1 = abs / id

    abs = 'λ' _* id _* '.' _* term

    app = atom _+ term

    id = ~"[A-Za-z]+"

    _ = ~"[^\S\n\r]"u
""")

print(grammar.parse("x y z w"))

Upvotes: 0

Bart Kiers
Bart Kiers

Reputation: 170237

This ANTLR4 grammar does the trick:

grammar T;

program
 : term EOF
 ;

term
 : Lambda Id '.' term
 | '(' term ')'
 | term term
 | Id
 ;

Lambda
 : '\u03BB'
 ;

Id
 : [a-z] [a-zA-Z0-9]*
 ;

Spaces
 : [ \t\r\n] -> skip
 ;

Place the above in a file called T.g4. Download the ANTLR4 jar into the same folder and do:

java -cp antlr-4.7.2-complete.jar org.antlr.v4.Tool -Dlanguage=Python3 T.g4

This will create the lexer and parser files.

Now run:

from antlr4 import *
from playground.TLexer import TLexer
from playground.TParser import TParser


tests = [
  'λ x. (y z)', 
  'x y z w'
]

for test in tests:
    lexer = TLexer(InputStream(test))
    parser = TParser(CommonTokenStream(lexer))
    tree = parser.program()
    print("{}".format(tree.toStringTree(recog=parser)))

which will print:

(program (term λ x . (term ( (term (term y) (term z)) ))) <EOF>)
(program (term (term (term (term x) (term y)) (term z)) (term w)) <EOF>)

Upvotes: 1

Related Questions