Reputation: 32429
On this page the author explains how to determine the FOLLOW sets of a CFG. Under the headline Syntax Analysis Goal: FOLLOW Sets he states:
Steps to Make the Follow Set
Conventions: a, b, and c represent a terminal or non-terminal. a* represents zero or more terminals or non-terminals (possibly both). a+ represents one or more... D is a non-terminal.
- Place an End of Input token ($) into the starting rule's follow set.
- Suppose we have a rule R → a*Db. Everything in First(b) (except for ε) is added to Follow(D). If First(b) contains ε then everything in Follow(R) is put in Follow(D).
- Finally, if we have a rule R → a*D, then everything in Follow(R) is placed in Follow(D).
- The Follow set of a terminal is an empty set.
So far so good. But in the box below this item, we read:
[...] Step 2 on rule 1 (N → V = E) indicates that first(=) is in Follow(V).
Now this is the part I don't understand. When he says that First(=) is in Follow (V), he obviously maps = to b and V to D (b and D from the explication in the first box). But (a*)(D)(b)
does not match ()(V)(=)E
.
Am I reading this completely wrong, or did the author maybe write a*Db
instead of a*Dba*
?
(Especially if you read this on wikipedia: "FOLLOW(I) of an Item I [A → α • B β, x] is the set of terminals that can appear after nonterminal B, where α, β are arbitrary symbol strings, and x is an arbitrary lookahead terminal.")
Upvotes: 0
Views: 859
Reputation: 241671
Yes, he meant:
R → a* D b*
and since b*
could be zero symbols, i.e. ε, the second rule is unneeded. Remember that FIRST
is defined on arbitrary sequences of symbols.
In other words, for:
A → α B β
FIRST(β)
is in FOLLOW(B)
, andβ ⇒* ε
, then everything in FOLLOW(A)
is in FOLLOW(B)
.Here's what Aho, Sethi & Ullman say in the dragon book:
Formally, we say
LR(1)
item[A → α·β, a]
is valid for a viable prefix γ if there is a derivation
S ⇒* δAw ⇒ δαβw
whereγ = δα
and eithera
is the first symbol ofw
orw
is ε anda
is$
.
(The ⇒
's above are marked rm
, meaning right-most derivation
; in other words, in every derivation step, the right-most non-terminal is substituted with one of its productions. Consequently, w
only contains terminals.)
In other words, the LR(1)
item is valid (could apply) if we've reached some point where we've decided that A
might be the next reduction and a
might follow A
; at the current point in the parse, we've read α. So if a
follows β, then the reduction is possible. We don't yet know that, unless β is the empty sequence, but we need to remember the fact in case it turns out that β can derive the empty sequence.
I hope that helps. It's late here and I'm too tired to check it again. Maybe tomorrow...
Upvotes: 1