alecbz
alecbz

Reputation: 6498

hand-written top down parser

I came across this answer for writing a top-down parser: Is there an alternative for flex/bison that is usable on 8-bit embedded systems? but I'm confused on a few points.

Say I have this grammar:

Expr = Id | Id '(' Expr ')'
Stmt = Id '=' Expr | Expr

I'm not sure if it's strictly necessary but I guess we can left-factor the grammar:

Expr = Id ExprRest
ExprRest = ϵ | '(' Expr ')'
Stmt = Id '=' Expr ';' | Expr ';'

How exactly would I write code that properly parses foo(x); as a Stmt? If we write the Stmt parsing code like:

func ParseStmt() {
  if ParseId() {
    return ParseTerminal('=') && ParseExpr() && ParseTerminal(';');
  } else {
    return ParseExpr() && ParseTerminal(';');
  }
}

We will see the Id foo, assume we are in the first case, and then fail because we fail to find a = following foo.

Is this grammar not LL(1)?

Upvotes: 1

Views: 283

Answers (2)

Ira Baxter
Ira Baxter

Reputation: 95430

Your problem is that ID(xxx) can appear in both statement and expression contexts.

As the other answer says, you have to factor more. It isn't that hard. Write your grammar like this:

Stmt = Id  ( '=' Expr ';' |
             RestOfExpression );

 RestOfExpression = '(' Expr ')' |
                     ϵ );
 Expr =  Id  RestOfExpression;

The code should be obvious.

Upvotes: 0

rici
rici

Reputation: 241971

That grammar is not LL(1), because an LL(1) grammar needs to be able to determine which production to expand given only the parsing stack and the look-ahead token. With an empty parsing stack, you can't tell which of the two Stmt productions to use given the Id lookahead token. You could left-factor it, but it gets annoying.

The executables created by bison, although they occupy a lot of lines of C, are actually pretty compact. Your grammar, compiled by bison, produced 1559 lines of C, but compiled into an object file of 4K, of which (I believe) only slightly more than 1K corresponds to actual code and data. (Of course, that's without actions.)

If you really want to hand-code a parser for an expression grammar, I'd suggest you use either the bottom-up 'Shunting Yard' algorithm or a top-down 'Pratt parser' (both of which are readily searchable with Google), which handle things like operator precedence better than a strict LL grammar. But you might well find that the bison-generated grammar is of comparable size and speed, and better readability and maintainability. (Or you might not. Tastes differ.)

Upvotes: 1

Related Questions