Hyperboreus
Hyperboreus

Reputation: 32429

Determining FOLLOW sets of CFG

On this page the author explains how to determine the FOLLOW sets of a CFG. Under the headline Syntax Analysis Goal: FOLLOW Sets he states:

Steps to Make the Follow Set

Conventions: a, b, and c represent a terminal or non-terminal. a* represents zero or more terminals or non-terminals (possibly both). a+ represents one or more... D is a non-terminal.

  1. Place an End of Input token ($) into the starting rule's follow set.
  2. Suppose we have a rule R → a*Db. Everything in First(b) (except for ε) is added to Follow(D). If First(b) contains ε then everything in Follow(R) is put in Follow(D).
  3. Finally, if we have a rule R → a*D, then everything in Follow(R) is placed in Follow(D).
  4. The Follow set of a terminal is an empty set.

So far so good. But in the box below this item, we read:

[...] Step 2 on rule 1 (N → V = E) indicates that first(=) is in Follow(V).

Now this is the part I don't understand. When he says that First(=) is in Follow (V), he obviously maps = to b and V to D (b and D from the explication in the first box). But (a*)(D)(b) does not match ()(V)(=)E.

Am I reading this completely wrong, or did the author maybe write a*Db instead of a*Dba*?

(Especially if you read this on wikipedia: "FOLLOW(I) of an Item I [A → α • B β, x] is the set of terminals that can appear after nonterminal B, where α, β are arbitrary symbol strings, and x is an arbitrary lookahead terminal.")

Upvotes: 0

Views: 859

Answers (1)

rici
rici

Reputation: 241671

Yes, he meant:

R → a* D b*

and since b* could be zero symbols, i.e. ε, the second rule is unneeded. Remember that FIRST is defined on arbitrary sequences of symbols.

In other words, for:

A → α B β

  1. Every terminal in FIRST(β) is in FOLLOW(B), and
  2. If β ⇒* ε, then everything in FOLLOW(A) is in FOLLOW(B).

Here's what Aho, Sethi & Ullman say in the dragon book:

Formally, we say LR(1) item [A → α·β, a] is valid for a viable prefix γ if there is a derivation
S ⇒* δAw ⇒ δαβw
where γ = δα and either a is the first symbol of w or w is ε and a is $.

(The 's above are marked rm, meaning right-most derivation; in other words, in every derivation step, the right-most non-terminal is substituted with one of its productions. Consequently, w only contains terminals.)

In other words, the LR(1) item is valid (could apply) if we've reached some point where we've decided that A might be the next reduction and a might follow A; at the current point in the parse, we've read α. So if a follows β, then the reduction is possible. We don't yet know that, unless β is the empty sequence, but we need to remember the fact in case it turns out that β can derive the empty sequence.

I hope that helps. It's late here and I'm too tired to check it again. Maybe tomorrow...

Upvotes: 1

Related Questions