Zan
Zan

Reputation: 288

Simplifying grammar via operator precedence

I am trying to parse C. I have been consulting some free-context C grammars and I have observed they usually model expressions by using "chained" production rules, for example [here][1] something like this is done to model logical or and logical and expressions:

<logical-or-expression> ::= <logical-and-expression>
                          | <logical-or-expression> || <logical-and-expression>

<logical-and-expression> ::= <inclusive-or-expression>
                           | <logical-and-expression> && <inclusive-or-expression>

I say the expressions are chained because they follow this structure:

expression with operator(N) ::= expression with operator(N+1)
        | (expression with operator(N)) operator(N) (expression with operator(N+1))

where N is the precedence of the operator. I understand that objetive is to disambiguate the language and introduce precedence and association rules in a purely syntactic manner.

Is there any reason to model expressions like this in an actual parser with operator precedence support? My initial idea was to implement them simply as:

constant_expression ::= expression1 binary_op expression2

where binary_op is any binary operation and then disambiguate by setting the precedence of all the operators. For example:

logical_expr ::= simple_expr | logical_expr && logical_expr | logical_expr || logical_expr

and then set the precedence of && operator higher than ||. I think this tactic would give a much simpler grammar, as it would eliminate the necessity of a different rule for every level of precedence but I am reluctant to use it because all the implementations I have seen use the former strategy, even in cases where the parser had precedence support.

Thanks in advance. [1]: https://cs.wmich.edu/~gupta/teaching/cs4850/sumII06/The%20syntax%20of%20C%20in%20Backus-Naur%20form.htm

Upvotes: 2

Views: 647

Answers (1)

templatetypedef
templatetypedef

Reputation: 373492

Many LR-style parsers can handle operator precedence rules using some mechanism external to the grammar itself in part because it allows you to skip this “layering” approach to writing CFGs. If you have a parser generator that supports this, it’s fine to write an ambiguous grammar and then add those external rules in to get the precedence and associativity right.

As a note - parsers for CFGs and BNF rules usually are insensitive to the order in which rules are written, so listing the operators from highest-precedence to lowest-precedence alone isn’t sufficient. (PEG parsers, on the other hand, do represent ordered choices). Also, due to how most parser generators work (having code to execute associated with each production, and using the terminals in a production to determine operator precedence), it’s often easier to have separate rules, one per binary operator, than it is to have one rule of the form “Expr Operator Expr.” But otherwise the basic approach is sound.

Upvotes: 2

Related Questions