kino
kino

Reputation: 43

PEG recursive grammar

I'm trying to write a Tiger parser. I initially used PyPEG, but due to some difficulties, I went with Arpeggio.

My grammar is really simple.

def number(): return _(r'[0-9]+')
def string(): return _(r"\".*?\"")
def id(): return _(r'[a-zA-Z][a-zA-Z0-9_]*')

def literal(): return [number, string]

def simple_var(): return id

def let_in_exp(): return 'let', 'in', Optional(ZeroOrMore(exp)), 'end'

param = [number, string]
params = Optional(param, ZeroOrMore(',', param))

def function_call(): return id, '(', params, ')'

exp = [let_in_exp, simple_var, literal, function_call]

def code(): return OneOrMore(exp), EOF

The difficulty resides in the let-in-exp expression. let in let in let in end end end is valid Tiger.

However - currently - Arpeggio doesn't recognize the let-in-exp as is, but as three simple-var. Indeed, going into the ZeroOrMore(exp), it consumes the end, and so doesn't find it for the let-in-exp.

How can one resolve such problem?

Upvotes: 3

Views: 933

Answers (2)

Igor Dejanović
Igor Dejanović

Reputation: 943

As Paul already noted, you should use Not syntactic predicate to avoid matching keywords by simple_var rule. Also, I suggest not to wrap ZeroOrMore parsing expression in Optional as it is already implied.

Solution for Arpeggio is

from arpeggio import Not, OneOrMore, ZeroOrMore, Optional, EOF, ParserPython
from arpeggio import RegExMatch as _

keyword = ['let', 'in', 'end']   
def number(): return _(r'[0-9]+')
def string(): return _(r"\".*?\"")
def id(): return _(r'[a-zA-Z][a-zA-Z0-9_]*')

def literal(): return [number, string]
def simple_var(): return Not(keyword), id
def let_in_exp(): return 'let', 'in', ZeroOrMore(exp), 'end'

param = [number, string]
params = Optional(param, ZeroOrMore(',', param))

def function_call(): return id, '(', params, ')'

exp = [let_in_exp, simple_var, literal, function_call]

def code(): return OneOrMore(exp), EOF

parser = ParserPython(code, debug=True)
test = 'let in 42 let in "foo" let in end end end'
parse_tree = parser.parse(test)

This will yield the following parse tree

enter image description here

Upvotes: 3

PaulMcG
PaulMcG

Reputation: 63709

Not an Arpeggio solution, but perhaps more idiomatic to your taste?

from pyparsing import (CaselessKeyword,Word,nums,QuotedString,alphas,alphanums,
    Forward,Group,Optional,OneOrMore,ZeroOrMore,delimitedList)

LET,IN,END = map(CaselessKeyword, "let in end".split())

number = Word(nums).setName("number")
string = QuotedString('"')
ident = ~(LET | IN | END) + Word(alphas, alphanums+'_')
ident.setName("ident")

literal = number | string

simple_var = ident

exp = Forward().setName("exp")
let_in_exp = Group(LET + IN + ZeroOrMore(exp) + END).setName("let_in_exp")

param = number | string
params = delimitedList(param)
function_call = ident() + '(' + Optional(params) + ')'

exp <<= let_in_exp | simple_var | literal | function_call

code = OneOrMore(exp)

tests = """\
    let in let in let in end end end
    let in let in let in "blah" end end end
    let in let in let in "blah" end 1729 end end
    """
code.runTests(tests)

I designed pyparsing to permit combining of expressions using arithmetic operators:

  • + -> And
  • | -> Match First
  • ^ -> Or (try all, match longest)
  • ~ -> Not
  • & -> Each (same as And, but in any order)
  • * -> multiple (as in expr*3 instead of expr+expr+expr)

I believe these operators and the plain language class names like OneOrMore make these parsers easier to understand, and to maintain over time.

Upvotes: 3

Related Questions