Reputation: 21
I am trying to build a compiler for a college projec. Currently I am having problems to create a part of my predict set for the LL(1) grammar. This is the part of my grammar that I am having problems with:
<DECLS> ::- id <DECL> <N_DECL>
<N_DECL> ::- op_, <DECLS> | null
<DECL> ::- <SOMETHING_MORE>
The first sets for this part of the grammar are:
<DECLS> = id
<N_DECL> = op_ , null
The follow sets for this part of the grammar are:
<DECLS> = op_; , op_,
<N_DECL> = op_; , op_,
The predict set for this part of the grammar is (see as a matrix):
id op_,
<DECLS> id <DECL> <N_DECL>
<N_DECL> op_, <DECLS> / null <--- here
As you can see the entry for N_DECL
and op_
has two entries. As far as I know, a grammar is wrong if the predict set has two or more entries in the same box. So I was wondering if someone can help me changing my grammar to solve this problem.
Upvotes: 2
Views: 689
Reputation:
If I understand your notation correctly, then
<N_DECL>
) = { op_,
, null
}<N_DECL>
) = { op_;
, op_,
}I don't see where op_,
in <N_DECL>
's FOLLOW set comes from (perhaps it's coming from a different part of your grammar), but if it's really there, your grammar would be ambiguous due to a FIRST/FOLLOW conflict. These occur whenever the FIRST set of a non-terminal contain null
(i.e. the non-terminal can produce the empty string), and the FIRST and FOLLOW set of that non-terminal overlap. In this case, they both contain op_,
, which causes problems as you have noticed.
From the fact that you list op_,
in <DECLS>
's FOLLOW set, I deduce that the actual ambiguity arises from a part of the grammar that you didn't include in your question. If you're parsing a <DECLS>
with a lookahead of one token, encountering a comma may mean that you have another op_, id <DECL> <N_DECL>
phrase, or that that comma belongs to a structure where the comma is "outside" the <DECLS>
.
It might be that your language is LL(1), but without knowing why <DECLS>
can be followed by op_,
, I wouldn't know how to resolve this conflict. However, if it turns out that your language isn't LL(1) as it is tokenised now, you should consider tricking the tokeniser to do the lookahead for you:
<N_DECL> ::- op_, id <DECL> <N_DECL> | null
.op_, id
as its own token comma_id
.<N_DECL> ::- comma_id <DECL> <N_DECL> | null
.Upvotes: 1