Reputation: 672
I am trying to figure out some details involving parsing expression grammars, and am stuck on the following question:
For the given grammar:
a = b Z
b = Z Z | Z
(where lower-case letters indicate productions, and uppercase letters indicate terminals). Is the production "a" supposed to match against the string "Z Z"?
Here is the pseudo-code that I've seen the above grammar get translated to, where each production is mapped to a function that outputs two values. The first indicates whether the parse succeeded. And the second indicates the resulting position in the stream after the parse.
defn parse-a (i:Int) -> [True|False, Int] :
val [r1, i1] = parse-b(i)
if r1 : eat("Z", i1)
else : [false, i]
defn parse-b1 (i:Int) -> [True|False, Int] :
val [r1, i1] = eat("Z", i)
if r1 : eat("Z", i1)
else : [false, i]
defn parse-b2 (i:Int) -> [True|False, Int] :
eat("Z", i)
defn parse-b (i:Int) -> [True|False, Int] :
val [r1, i1] = parse-b1(i)
if r1 : [r1, i1]
else : parse-b2(i)
The above code will fail when trying to parse the production "a" on the input "Z Z". This is because the parsing function for "b" is incorrect. It will greedily consume both Z's in the input and succeed, and then leave nothing left for a to parse. Is this what a parsing expression grammar is supposed to do? The pseudocode in Ford's thesis seems to indicate this.
Thanks very much.
-Patrick
Upvotes: 2
Views: 539
Reputation: 672
Thanks for your reply Rici.
I re-read Ford's thesis much more closely, and it reaffirms what you said. PEGs / operator are both ordered and greedy. So the rule presented above is supposed to fail.
-Patrick
Upvotes: 0
Reputation: 241911
In PEGs, disjunctions (alternatives) are indeed ordered. In Ford's thesis, the operator is written / and called "ordered choice", which distinguishes it from the | disjunction operator.
That makes PEGs fundamentally different from CFGs. In particular, given PEG rules a -> b Z
and b -> Z Z / Z
, a
will not match Z Z
.
Upvotes: 1