How to identify whether a grammar is LR(n), LL(n)

Question

For a language that is not LL(1) or LR(1) how can one try to find out if some number n exists such that the grammar can be LL(n) or LR(n)?

You check if a grammar is LR(0) by looking at the canonical collection of LR(0) items. Then, assuming it wasn't LR(0), you can check if it is LR(1) by introducing the lookahead symbol. My simple reasoning tells me that, to check whether it is LR(2) or not, you probably have to make the lookahead contain the next two symbols instead of just one. For LR(3) you have to take three symbols into consideration etc.

Even if this is the case, even though I doubt it, I am struggling to think of how can one try to identify (or even get a hint at) an n, or the non-existence thereof, for which a specific grammar can be LR(n) and/or LL(n) without checking incrementally from an arbitary LR(m) upwards.

rici · Accepted Answer

If a language is LR(k) for some k>1, then it is LR(1). (That's not true for a grammar, of course.) That is, if you have an LR(k) grammar for a language, then you can mechanically construct an LR(1) grammar which allows you to recover the original parse tree. This is not true of LL(k); LL(k) languages are a strict subset of LL(k+1) languages.
The test you propose will indeed let you decide whether a grammar is LR(k) for some given k (or LL(k)). Unfortunately, there's no way of figuring out the smallest possible value of k other than the successive search you propose, and there is no guarantee that the search will ever terminate.
Although the problem is hard (or impossible) in the general case, it can often be answered for specific grammars, by considering the possible valid successors of a grammar state which exhibits conflicts.

In most real-world grammars, there will only be a few conflicts, so manual examination of conflicting states is possible. In general terms, one needs to figure out the path which led to the conflicting state, and the possible continuations. In many cases it will be clear that the parsing conflict could be resolved with slightly more lookahead.

A large class of grammars where this will fail is the set of ambiguous grammars. An ambiguous grammar cannot be LR(k) (or LL(k)) for any k. Again, the question of whether a grammar is ambiguous is not decidable but effective heuristics exist, some of which are included in commercial products.

Again, it is often easy to find ambiguities in real-world grammars, either by visual inspection (as above), or by feeding a lot of valid texts into a GLR parser (such as the one produced by bison) until an ambiguity is reported. (Alternatively, you can enumerate valid texts from the grammar with a straight-forward algorithm, and see if a text appears twice in the enumeration.)

Here are a couple of possibly relevant SO questions illustrating analysis techniques. I'm sure there are more.

A yacc shift/reduce conflict on an unambiguous grammar

Bison reduce/reduce situation

yacc shift-reduce for ambiguous lambda syntax

How to understand and fix conflicts in PLY

How to identify whether a grammar is LR(n), LL(n)

Answers (1)

Related Questions