Reputation: 1808
I understand how an LL recursive descent parser can handle rules of this form:
A = B*;
with a simple loop that checks whether to continue looping or not based on whether the lookahead token matches a terminal in the FIRST set of B. However, I'm curious about table based LL parsers: how can rules of this form work there? As far as I know, the only way to handle repetition like this in one is through right-recursion, but that messes up associativity in cases where a right-associative parse tree is not desired.
I'd like to know because I'm currently attempting to write an LL(1) table-based parser generator and I'm not sure how to handle a case like this without changing the intended parse tree shape.
Upvotes: 6
Views: 750
Reputation: 194
Yes, this is definitely possible. The standard method of rewriting to BNF and constructing a parse table is useful for figuring out how the parser should work – but as far as I can tell, what you're asking is how you can avoid the recursive part, which would mean that you'd get the slanted binary tree/linked list form of AST.
If you're hand-coding the parser, you can simply use a loop, using the lookaheads from the parse table that indicate a recursive call to decide to go around the loop once more. (I.e., you could just use while
with those lookaheads as the condition.) Then for each iteration, you simply append the constructed subtree as a child of the current parent. In your case, then, A
would get several direct B
-children.
Now, as I understand it, you're building a parser generator, and it might be easiest to follow the standard procedure, going via plan BNF. However, that's not really an issue; there is no substantive difference between iteration and recursion, after all. You simply have to have a class of “helper rules” that don't introduce new AST nodes, but that rather append their result to the node of the nonterminal that triggered them. So when turning the repetition into X -> BX
, rather than constructing X
nodes, you have your X
rule extend the child-list of the A
or X
(whichever triggered it) by its own children. You'll still end up with A
having several B
children, and no X
nodes in sight.
Upvotes: 1
Reputation: 318
Let's expand your EBNF grammar to simple BNF and assume, that b
is a terminal and <e>
is an empty string:
A -> X
X -> BX
X -> <e>
B -> b
This grammar produces strings of terminal b
's of any length.
To construct the table, we will need to generate the first and follow sets (constructing an LL(1) parsing table).
First(α)
is the set of terminals that begin strings derived from any string of grammar symbols α
.
First(A) : b, <e>
First(X) : b, <e>
First(B) : b
Follow(A)
is the set of terminals a that can
appear immediately to the right of a nonterminal A
.
Follow(A) : $
Follow(X) : $
Follow(B) : b$
We can now construct the table based on the sets, $
is the end of input marker.
+---+---------+----------+
| | b | $ |
+---+---------+----------+
| A | A -> X | A -> X |
| X | X -> BX | X -> <e> |
| B | B -> b | |
+---+---------+----------+
The parser action always depends on the top of the parse stack and the next input symbol.
$
on top of the parse stack:
$
is the input symbol: accept input$
is not the input symbol: parse errorLet us analyze the input bb
. The initial parse stack contains the start symbol and the end marker A $
.
+-------+-------+-----------+
| Stack | Input | Action |
+-------+-------+-----------+
| A $ | bb$ | A -> X |
| X $ | bb$ | X -> BX |
| B X $ | bb$ | B -> b |
| b X $ | bb$ | consume b |
| X $ | b$ | X -> BX |
| B X $ | b$ | B -> b |
| b X $ | b$ | consume b |
| X $ | $ | X -> <e> |
| $ | $ | accept |
+-------+-------+-----------+
As you can see, rules of the form A = B*
can be parsed without problems. The resulting concrete parse tree for input bb
would be:
Upvotes: 3