Reputation: 113
Recently I was learning about Parsing Expression Grammars. After reading some materials on it by searching the web I finally sat down to make my hands dirty. I thought of implementing the hoc program from The Unix Programming Environment by Kernighan and Pike, with PEG. For that matter I chose PEG parser generator peg/leg as it has good documentation and an extensive collection of examples. I took dc.peg from the examples as my starting point and just implemented support for unary minus in phoc version 1.
Now when I write the following productions for unary minus the generated parser is not recognizing ((-9)), ((-9)*3)/4, (8-(-2)*5)/2-(2+2), etc.
Operand <- SGNValue
/ Value
SGNValue <- OPEN? '-' Value CLOSE?
Value <- NUMBER
/ OPEN Sum CLOSE
But if I change SGNValue to the following those expressions are perfectly matched.
SGNValue <- '-' NUMBER
/ '-' OPEN Sum CLOSE
Could anyone here please explain to me what was wrong with the above production rule?
I am also attaching phoc1.peg file with this post.
Thanks and regards Santanu
# Grammar
Expr <- SPACE Sum EOL { printf("%f\n", pop()); }
/ (!(EOL / 'q') .)* EOL { printf("error: quiting hoc1\n"); \
exit(1); }
/ 'q' EOL { printf("bye\n"); exit(0);}
Sum <- Product ( PLUS Product { double r = pop(), l = pop(); \
push(l + r); }
/ MINUS Product { double r = pop(), l = pop(); \
push(l - r); }
)*
Product <- Operand ( TIMES Operand { double r = pop(), l = pop(); \
push(l * r); }
/ DIVIDE Operand { double r = pop(), l = pop(); \
push(l / r); }
)*
Operand <- SGNValue
/ Value
## SGNValue <- OPEN? '-' Value CLOSE?
## This production rule was throwing error
## for expressions like ((-9)), ((-9)*3)/4,
## (8-(-2)*5)/2-(2+2), etc. probably due to
## left recurssion. But it was behaving as expected
## for (((9))), ((9)*2), etc.
SGNValue <- '-' NUMBER { push(-1*atof(yytext)); }
/ '-' OPEN Sum CLOSE { double d = pop(); push(-1*d); }
Value <- NUMBER { push(atof(yytext)); }
/ OPEN Sum CLOSE
# Lexemes
NUMBER <- < [0-9]+ ('.' [0-9]+)? > SPACE
PLUS <- '+' SPACE
MINUS <- '-' SPACE
TIMES <- '*' SPACE
DIVIDE <- '/' SPACE
OPEN <- '(' SPACE
CLOSE <- ')' SPACE
SPACE <- [ \t]*
EOL <- '\n' / '\r\n' / '\r'
Upvotes: 0
Views: 282
Reputation: 241671
When you write:
SGNValue <- OPEN? '-' Value CLOSE?
you're saying that both the OPEN
and the CLOSE
are optional, which means that either or both may be present. It will happily match, for example, -9)
, where only the OPEN
is missing.
So the prefix ((-9
matches VALUE <- OPEN Sum CLOSE
leaving (-9
to match SUM
, which recurses to OPERAND
which matches VALUE / SGNValue
; remember that PEG's / operator is order-aware, so it first tries VALUE
which again matches the (
and recurses. Then we have -9
which can't match VALUE
but it can certainly match SGNValue
. Unfortunately, -9)
matches SGNValue
, as noted above, after which there are not enough close parentheses to match the pending open parentheses from the two VALUE
s.
I'm not a big fan of PEG, and order-aware alternation is one of the reasons. But that's a very personal opinion; if they work for you, more power to you. OPEN? Sum CLOSE?
would be incorrect in any BNF-style grammar formalism, but normally it would lead to accepting erroneous sentences rather than rejecting correct ones.
Upvotes: 3