Reputation: 109
I would like an example of a PEG grammar for deeply nested Python boolean expressions. The PEG grammar can't be "very recursive" or this will lead to stack overflow. Support for '|', '&', '(' and ')' required.
Example of input: (a=1|b=1|c=1|d=1&e=2)&(f=2&g=2&h=2|i=3|j=3|k=3)
Upvotes: 2
Views: 1355
Reputation: 95410
To parse typical (especially Boolean logic) expressions, you hardly need a Peg parser; there's never really a need to backtrack. Pick any parsing engine and use it (including a Peg engine if you insist).
Here's a grammar that will work for your example:
EXPRESSION = DISJUNCTION ;
DISJUNCTION = CONJUNCTION ( '|' CONJUNCTION )*;
CONJUNCTION = TERM ( '&' TERM );
TERM = '~' TERM | '(' EXPRESSION ')' | CONDITION ;
CONDITION = VARIABLE '=' CONSTANT ;
If you want to implement this grammar using a hand-written, recursive descent parser that is easy to code, see my SO answer on how to do this: Is there an alternative for flex/bison that is usable on 8-bit embedded systems?
Regarding overflowing the stack: unless your expression is nested hundreds of levels ("parentheses") deep, on most windows or linux system, the available stack is far bigger than the stack demands of a parser applied to expressions that people write in practice. With huge expressions, pretty much people can't read them anyway so they don't occur. OP's example expression requires nesting of just a few stack entries.
If you write a grammar for a full-fledged programming language, and somebody writes a big, complex program in that langauge, you might risk overflowing the stack. I can tell you from experience with a compiler I built that a nesting of 256 (stack frames) fits easily in Windows 1Mb stack space, and that's enough to compile 2 million line program with every strange construct and deeply nested lexical scoping stunt known to mankind in it.
Upvotes: 3
Reputation: 303441
Here's a PEG grammar that supports the simple boolean expression you have above (and a little more), with the assumption that AND should bind more tightly than OR:
expr = multiOR !.
multiOR = multiAND (_wsp opOR _wsp multiAND)*
multiAND = comparison (_wsp opAND _wsp comparison)*
comparison = varOrNum _wsp opCOMP _wsp varOrNum
/ '!'? var
/ '!'? '(' multiOR ')'
varOrNum = var / num
var = [a-z]i [a-z0-9_]i*
num = '-'? [0-9]+ ('.' [0-9]+)?
opOR = '|' '|'?
opAND = '&' '&'?
opCOMP = [<>=] '='? / [≤≥≠] / '!='
_wsp = [ \t]*
Here you can see the result of applying this PEG to input like yours:
Obviously the comparison operators are a little overboard, and it supports both |
/&
and ||
/&&
for the boolean operators. Feel free to modify as desired.
Upvotes: 1