Santanu
Santanu

Reputation: 113

peg/leg - some curious behavior of PEG specification for expression parsing

Recently I was learning about Parsing Expression Grammars. After reading some materials on it by searching the web I finally sat down to make my hands dirty. I thought of implementing the hoc program from The Unix Programming Environment by Kernighan and Pike, with PEG. For that matter I chose PEG parser generator peg/leg as it has good documentation and an extensive collection of examples. I took dc.peg from the examples as my starting point and just implemented support for unary minus in phoc version 1.

Now when I write the following productions for unary minus the generated parser is not recognizing ((-9)), ((-9)*3)/4, (8-(-2)*5)/2-(2+2), etc.

Operand     <-    SGNValue
                  / Value
SGNValue    <-    OPEN? '-' Value CLOSE?
Value       <-    NUMBER                                     
                  / OPEN Sum CLOSE

But if I change SGNValue to the following those expressions are perfectly matched.

SGNValue    <-    '-' NUMBER                                        
                  / '-' OPEN Sum CLOSE

Could anyone here please explain to me what was wrong with the above production rule?

I am also attaching phoc1.peg file with this post.

Thanks and regards Santanu

# Grammar

Expr        <-    SPACE Sum EOL                                      { printf("%f\n", pop()); }
                  / (!(EOL / 'q') .)* EOL                            { printf("error: quiting hoc1\n"); \
                                                                   exit(1); }
                  / 'q' EOL                                          { printf("bye\n"); exit(0);}

Sum         <-    Product ( PLUS  Product                            { double r = pop(), l = pop(); \
                                                                   push(l + r); }
                  / MINUS Product                                    { double r = pop(), l = pop(); \
                                                                   push(l - r); }
                  )*

Product     <-    Operand ( TIMES  Operand                           { double r = pop(), l = pop(); \
                                                                   push(l * r); }
                  / DIVIDE Operand                                   { double r = pop(), l = pop(); \
                                                                   push(l / r); }
                  )*

Operand     <-    SGNValue
                  / Value

## SGNValue    <-    OPEN? '-' Value CLOSE?
## This production rule was throwing error
## for expressions like ((-9)), ((-9)*3)/4,
## (8-(-2)*5)/2-(2+2), etc. probably due to
## left recurssion. But it was behaving as expected
## for (((9))), ((9)*2), etc.

SGNValue    <-    '-' NUMBER                                         { push(-1*atof(yytext)); }
                  / '-' OPEN Sum CLOSE                               { double d = pop(); push(-1*d); }

Value       <-    NUMBER                                             { push(atof(yytext)); }
                  / OPEN Sum CLOSE

# Lexemes

NUMBER  <-  < [0-9]+ ('.' [0-9]+)? >  SPACE
PLUS    <-  '+'                       SPACE
MINUS   <-  '-'                       SPACE
TIMES   <-  '*'                       SPACE
DIVIDE  <-  '/'                       SPACE
OPEN    <-  '('                       SPACE
CLOSE   <-  ')'                       SPACE
SPACE   <-  [ \t]*
EOL     <-  '\n' / '\r\n' / '\r'

Upvotes: 0

Views: 282

Answers (1)

rici
rici

Reputation: 241671

When you write:

SGNValue    <-    OPEN? '-' Value CLOSE?

you're saying that both the OPEN and the CLOSE are optional, which means that either or both may be present. It will happily match, for example, -9), where only the OPEN is missing.

So the prefix ((-9 matches VALUE <- OPEN Sum CLOSE leaving (-9 to match SUM, which recurses to OPERAND which matches VALUE / SGNValue; remember that PEG's / operator is order-aware, so it first tries VALUE which again matches the ( and recurses. Then we have -9 which can't match VALUE but it can certainly match SGNValue. Unfortunately, -9) matches SGNValue, as noted above, after which there are not enough close parentheses to match the pending open parentheses from the two VALUEs.

I'm not a big fan of PEG, and order-aware alternation is one of the reasons. But that's a very personal opinion; if they work for you, more power to you. OPEN? Sum CLOSE? would be incorrect in any BNF-style grammar formalism, but normally it would lead to accepting erroneous sentences rather than rejecting correct ones.

Upvotes: 3

Related Questions