Amol Borkar
Amol Borkar

Reputation: 2609

How to check if a grammar is LL(1) and not ambigious

This is my first attempt of writing a grammar for a new (python-like) language I'm trying to parse using a Recursive Descent parser. I have no background on EBNF, context-free grammar and parsing and I'm finding it really difficult to clearly define what I want.

For example, take the Expression production which can lead to an Assignment. But what if the output of an expression is being assigned to the identifier? To accommodate for this I've added the <Identifier> '=' Expression rule, but I think this will lead to infinite recursion once I implement my Parser.

I'm basically looking for pointers on how define a Grammar that is as succinct as possible and how to ensure that my Grammar is unambigious. Any tips/resources are greatly appreciated.

Statement

Statement ::= 
          |   'if' Condition ':' Statement
          |   'if' Condition ':' Statement 'else:' Statement
          |   Expression
          |   Function

Expression

Expression ::= Assignment
           |   MathExpression
           |   Condition

Assignment

Assignment ::= <Identifier> '=' Term
           |   <Identifier> '=' Expression
           |   <Identifier> '=' Function

MathExpression

MathExpression ::= Term ( '+' | '-' ) Expression
               |   Term ( '*' | '/' ) Expression
               |   Term

Condition

Condition ::= Term ( '<' | '>' | '<=' | '>=' } MathExpression
          |   Term ( '<' | '>' | '<=' | '>=' } Term

Term

Term ::= <Identifier>
         |   <Literal>

Upvotes: 0

Views: 717

Answers (1)

Michael Dyck
Michael Dyck

Reputation: 2458

The straightforward way to check if a grammar is LL(1) is to attempt to generate the LL(1) parsing tables for it. If that succeeds, then the grammar is LL(1) (and therefore unambiguous).

However, you say you want to use a recursive descent parser, not an LL(1) parser, so ensuring the grammar is LL(1) (or even unambiguous) might not be necessary. If you write the parser by hand, you can basically do whatever you want, as long as it works. On the other hand, if you're going to generate it with a parser-generator, then your grammar will need to conform to its constraints, which might be different from LL(1).

Anyway, your grammar is ambiguous (and therefore not LL(1)). For example, the Statement production has the classic 'dangling else' ambiguity. For another, the derivation Expression -> MathExpression -> Term means that there are two ways for Assignment to derive <Identifier> '=' Term and two ways for Condition to derive (e.g.) Term '<' Term

You can resolve the latter problems by eliminating some productions:

  • Assignment ::= <Identifier> '=' Term
  • Condition ::= Term ( '<' | '>' | '<=' | '>=' } Term You should convince yourself that this doesn't change the language generated by the grammar.

For the dangling else problem, that depends. If you're writing the recursive descent parser by hand, then one possibility is to leave it ambiguous in the grammar, but take a particular action in the parser when the next symbol is 'else'. If you're using a parser-generator, there's a fair chance that its documentation will tell you whether/how you can use it to resolve the ambiguity.

Even if you deal with the ambiguities, this grammar might not work well as the basis for an RD parser, because alternatives are not easily distinguishable. E.g., at the start of the function for Expression, if the next symbol is an <Identifier>, that could be the start of an Assignment or a MathExpression or a Condition, so which of those functions do you call? (An LL(1) parser-generator would complain that the alternatives do not have disjoint FIRST sets.) On the other hand, I think you could resolve the problem by looking at the symbol after the <Identifier> (i.e., you'd need at least a 2-symbol lookahead).

Re your worry that the derivation Expression -> Assignment -> <Identifier> '=' Expression will lead to infinite recursion: it will certainly lead to recursion, but not infinite recursion. Every recursive invocation of the Expression-parsing function consumes an <Identifier> and an '=', and a program will have only finitely many of those, so the recursion must 'bottom out' eventually.

Upvotes: 0

Related Questions