K810
K810

Reputation: 3

Parsing + and * in boolean expressions by recursive descent

I am writing a recursive descent parser for Boolean expressions, for example:

(1 * 0) 
(0 + ~1)
(0 * (1 + c)

Where 1 is 'True', 0 is 'False', + is 'or', * is 'and', ~ is 'not' and 'c' is just some variable name (it could be any single alphabetic letter). I plan on using parentheses rather than implementing some kind of order of operations.

My current parser can recognize the following form of expression

Expression ::= 1
             | 0 
             | Character
             | ~ Expression

But I am unsure as to how I would implement + and * on top of this. I am fairly certain from what I have read the obvious implementation of

Expression ::= 1
             | 0
             | Character
             | ( Expression + Expression )
             | ( Expression * Expression )

Would cause an infinite loop as it is 'left-recursive'. I am unsure how to change this to remove such infinite recursion.

Upvotes: 0

Views: 575

Answers (2)

Prune
Prune

Reputation: 77850

Let me search that for you ... https://en.wikipedia.org/wiki/Recursive_descent_parser

The leading LPAREN keeps this from being left-recursive. If you want to generalize the expressions and have some operator precedence, follow the expression portion of the BNF in the Wikipedia article.

However, you have a syntax ambiguity in the grammar you've chosen. When you have operators of the same precedence, combine them into a non-terminal, such as

LogOp ::= + | *

Label similar operands to allow for expansion:

UnaryOp ::= ~

Now you can ... never mind, @500 just posted a good answer that covers my final point.

Upvotes: 0

With the parenthesis in place, what you have there is not left recursive. Left recursion is when a production can reach itself (directly or indirectly) with no tokens consumed in between. Such grammars do indeed cause infinite recursion in recursive descent parsers, but that can't happen with yours.

You do have the issue that the grammar as it stands is ambiguous: After a parenthesis, it isn't known whether the + or the * form is being parsed until the entire left-hand expression has been parsed.

One way of getting around that issue is by pulling up the common parts in a shared prefix/suffix production:

Expression ::= 1
             | 0
             | Character
             | ParExpr

ParExpr ::= ( Expression ParOp  Expression )

ParOp ::= +
        | *

Upvotes: 1

Related Questions