Reputation: 43
I'm trying to write a Tiger parser. I initially used PyPEG, but due to some difficulties, I went with Arpeggio.
My grammar is really simple.
def number(): return _(r'[0-9]+')
def string(): return _(r"\".*?\"")
def id(): return _(r'[a-zA-Z][a-zA-Z0-9_]*')
def literal(): return [number, string]
def simple_var(): return id
def let_in_exp(): return 'let', 'in', Optional(ZeroOrMore(exp)), 'end'
param = [number, string]
params = Optional(param, ZeroOrMore(',', param))
def function_call(): return id, '(', params, ')'
exp = [let_in_exp, simple_var, literal, function_call]
def code(): return OneOrMore(exp), EOF
The difficulty resides in the let-in-exp
expression.
let in let in let in end end end
is valid Tiger.
However - currently - Arpeggio doesn't recognize the let-in-exp
as is, but as three simple-var
. Indeed, going into the ZeroOrMore(exp), it consumes the end
, and so doesn't find it for the let-in-exp
.
How can one resolve such problem?
Upvotes: 3
Views: 933
Reputation: 943
As Paul already noted, you should use Not
syntactic predicate to avoid matching keywords by simple_var
rule. Also, I suggest not to wrap ZeroOrMore
parsing expression in Optional
as it is already implied.
Solution for Arpeggio is
from arpeggio import Not, OneOrMore, ZeroOrMore, Optional, EOF, ParserPython
from arpeggio import RegExMatch as _
keyword = ['let', 'in', 'end']
def number(): return _(r'[0-9]+')
def string(): return _(r"\".*?\"")
def id(): return _(r'[a-zA-Z][a-zA-Z0-9_]*')
def literal(): return [number, string]
def simple_var(): return Not(keyword), id
def let_in_exp(): return 'let', 'in', ZeroOrMore(exp), 'end'
param = [number, string]
params = Optional(param, ZeroOrMore(',', param))
def function_call(): return id, '(', params, ')'
exp = [let_in_exp, simple_var, literal, function_call]
def code(): return OneOrMore(exp), EOF
parser = ParserPython(code, debug=True)
test = 'let in 42 let in "foo" let in end end end'
parse_tree = parser.parse(test)
This will yield the following parse tree
Upvotes: 3
Reputation: 63709
Not an Arpeggio solution, but perhaps more idiomatic to your taste?
from pyparsing import (CaselessKeyword,Word,nums,QuotedString,alphas,alphanums,
Forward,Group,Optional,OneOrMore,ZeroOrMore,delimitedList)
LET,IN,END = map(CaselessKeyword, "let in end".split())
number = Word(nums).setName("number")
string = QuotedString('"')
ident = ~(LET | IN | END) + Word(alphas, alphanums+'_')
ident.setName("ident")
literal = number | string
simple_var = ident
exp = Forward().setName("exp")
let_in_exp = Group(LET + IN + ZeroOrMore(exp) + END).setName("let_in_exp")
param = number | string
params = delimitedList(param)
function_call = ident() + '(' + Optional(params) + ')'
exp <<= let_in_exp | simple_var | literal | function_call
code = OneOrMore(exp)
tests = """\
let in let in let in end end end
let in let in let in "blah" end end end
let in let in let in "blah" end 1729 end end
"""
code.runTests(tests)
I designed pyparsing to permit combining of expressions using arithmetic operators:
+
-> And |
-> Match First ^
-> Or (try all, match longest) ~
-> Not &
-> Each (same as And, but in any order)*
-> multiple (as in expr*3
instead of expr+expr+expr
)I believe these operators and the plain language class names like OneOrMore
make these parsers easier to understand, and to maintain over time.
Upvotes: 3