Reputation: 1808

Can a table-based LL parser handle repetition without right-recursion?

I understand how an LL recursive descent parser can handle rules of this form:

A = B*;

with a simple loop that checks whether to continue looping or not based on whether the lookahead token matches a terminal in the FIRST set of B. However, I'm curious about table based LL parsers: how can rules of this form work there? As far as I know, the only way to handle repetition like this in one is through right-recursion, but that messes up associativity in cases where a right-associative parse tree is not desired.

I'd like to know because I'm currently attempting to write an LL(1) table-based parser generator and I'm not sure how to handle a case like this without changing the intended parse tree shape.

Upvotes: 6

Answers (2)

Magnus Lie Hetland

Reputation: 194

Yes, this is definitely possible. The standard method of rewriting to BNF and constructing a parse table is useful for figuring out how the parser should work – but as far as I can tell, what you're asking is how you can avoid the recursive part, which would mean that you'd get the slanted binary tree/linked list form of AST.

If you're hand-coding the parser, you can simply use a loop, using the lookaheads from the parse table that indicate a recursive call to decide to go around the loop once more. (I.e., you could just use while with those lookaheads as the condition.) Then for each iteration, you simply append the constructed subtree as a child of the current parent. In your case, then, A would get several direct B-children.

Now, as I understand it, you're building a parser generator, and it might be easiest to follow the standard procedure, going via plan BNF. However, that's not really an issue; there is no substantive difference between iteration and recursion, after all. You simply have to have a class of “helper rules” that don't introduce new AST nodes, but that rather append their result to the node of the nonterminal that triggered them. So when turning the repetition into X -> BX, rather than constructing X nodes, you have your X rule extend the child-list of the A or X (whichever triggered it) by its own children. You'll still end up with A having several B children, and no X nodes in sight.

Upvotes: 1

Jen-Ya

Reputation: 318

The Grammar

Let's expand your EBNF grammar to simple BNF and assume, that b is a terminal and <e> is an empty string:

A -> X
X -> BX
X -> <e>
B -> b

This grammar produces strings of terminal b's of any length.

The LL(1) Table

To construct the table, we will need to generate the first and follow sets (constructing an LL(1) parsing table).

First sets

First(α) is the set of terminals that begin strings derived from any string of grammar symbols α.

First(A) : b, <e>
First(X) : b, <e>
First(B) : b

Follow sets

Follow(A) is the set of terminals a that can appear immediately to the right of a nonterminal A.

Follow(A) : $
Follow(X) : $
Follow(B) : b$

Table

We can now construct the table based on the sets, $ is the end of input marker.

+---+---------+----------+
|   |    b    |    $     |
+---+---------+----------+
| A | A -> X  | A -> X   |
| X | X -> BX | X -> <e> |
| B | B -> b  |          |
+---+---------+----------+

The parser action always depends on the top of the parse stack and the next input symbol.

Terminal on top of the parse stack:
1. Matches the input symbol: pop stack, advance to the next input symbol
2. No match: parse error
Nonterminal on top of the parse stack:
1. Parse table contains production: apply production to stack
2. Cell is empty: parse error
$ on top of the parse stack:
1. $ is the input symbol: accept input
2. $ is not the input symbol: parse error

Sample Parse

Let us analyze the input bb. The initial parse stack contains the start symbol and the end marker A $.

+-------+-------+-----------+
| Stack | Input |  Action   |
+-------+-------+-----------+
| A $   | bb$   | A -> X    |
| X $   | bb$   | X -> BX   |
| B X $ | bb$   | B -> b    |
| b X $ | bb$   | consume b |
| X $   | b$    | X -> BX   |
| B X $ | b$    | B -> b    |
| b X $ | b$    | consume b |
| X $   | $     | X -> <e>  |
| $     | $     | accept    |
+-------+-------+-----------+

Conclusion

As you can see, rules of the form A = B* can be parsed without problems. The resulting concrete parse tree for input bb would be:

parse tree

Upvotes: 3