Reputation: 1467
The bison grammar I wrote for parsing a text file gives me 10 shift/reduce conflicts. The parser.output file doesn't help me enough. The file gives me information as:
State 38 conflicts: 5 shift/reduce
State 40 conflicts: 4 shift/reduce
State 46 conflicts: 1 shift/reduce
Grammar
0 $accept: session $end
1 session: boot_data section_start
2 boot_data: head_desc statement head_desc head_desc
3 head_desc: OPEN_TOK BOOT_TOK statement CLOSE_TOK
4 | OPEN_TOK statement CLOSE_TOK
5 statement: word
6 | statement word
7 word: IDENTIFIER
8 | TIME
9 | DATE
10 | DATA
11 section_start: section_details
12 | section_start section_details
13 | section_start head_desc section_details
14 $@1: /* empty */
15 section_details: $@1 section_head section_body section_end
16 section_head: START_TOK head_desc START_TOK time_stamp
17 time_stamp: statement DATE TIME
18 section_body: log_entry
19 | section_body log_entry
20 log_entry: entry_prefix body_statements
21 | entry_prefix TIME body_statements
22 body_statements: statement
23 | head_desc
24 entry_prefix: ERROR_TOK
25 | WARN_TOK
26 | /* empty */
27 $@2: /* empty */
28 section_end: END_TOK statement $@2 END_TOK head_desc
state 38
8 word: TIME .
21 log_entry: entry_prefix TIME . body_statements
OPEN_TOK shift, and go to state 1
TIME shift, and go to state 6
DATE shift, and go to state 7
DATA shift, and go to state 8
IDENTIFIER shift, and go to state 9
OPEN_TOK [reduce using rule 8 (word)]
TIME [reduce using rule 8 (word)]
DATE [reduce using rule 8 (word)]
DATA [reduce using rule 8 (word)]
IDENTIFIER [reduce using rule 8 (word)]
$default reduce using rule 8 (word)
head_desc go to state 39
statement go to state 40
word go to state 11
body_statements go to state 45
state 39
23 body_statements: head_desc .
$default reduce using rule 23 (body_statements)
state 40
6 statement: statement . word
22 body_statements: statement .
TIME shift, and go to state 6
DATE shift, and go to state 7
DATA shift, and go to state 8
IDENTIFIER shift, and go to state 9
TIME [reduce using rule 22 (body_statements)]
DATE [reduce using rule 22 (body_statements)]
DATA [reduce using rule 22 (body_statements)]
IDENTIFIER [reduce using rule 22 (body_statements)]
$default reduce using rule 22 (body_statements)
word go to state 19
state 46
9 word: DATE .
17 time_stamp: statement DATE . TIME
TIME shift, and go to state 48
TIME [reduce using rule 9 (word)]
$default reduce using rule 9 (word)
The equivalent part of my grammar is:
statement : word
{
printf("WORD\n");
$$=$1;
}
|statement word
{
printf("STATEMENTS\n");
$$=$1;
printf("STATEMENT VALUE== %s\n\n",$$);
}
;
word : IDENTIFIER
{
printf("IDENTIFIER\n");
$$=$1;
}
|TIME
{
printf("TIME\n");
$$=$1;
}
|DATE
{
printf("DATE\n");
$$=$1;
}
|DATA
{
}
;
section_start : section_details
{
printf("SINGLE SECTIONS\n");
}
|section_start section_details
{
printf("MULTIPLE SECTIONS\n");
}
|section_start head_desc section_details
;
section_details :
{
fprintf(fp,"\n%d:\n",set_count);
}
section_head section_body section_end
{
printf("SECTION DETAILS\n");
set_count++;
}
;
section_head : START_TOK head_desc START_TOK statement time_stamp
{
printf("SECTION HEAD...\n\n%s===\n\n%s\n",$2,$4);
fprintf(fp,"%s\n",$4);
}
;
time_stamp : DATET TIME
{
}
;
section_body :log_entry
{
}
|section_body log_entry
{
}
;
log_entry : entry_prefix body_statements
{
}
|entry_prefix TIME body_statements
{
}
;
body_statements : statement
{
}
|head_desc
{
}
;
Please help me to fix this..
Thanks
Upvotes: 2
Views: 5890
Reputation: 126120
A conflict in a yacc/bison parser means that the grammar is not LALR(1) -- which generally means that something is either ambiguous or needs more than 1 token of lookahead. The default resolution is to choose always shifting over reducing, or choose always reducing the first rule (for reduce/reduce conflicts), which results in a parser that will recognize a subset of the language described by the grammar. That may or may not be ok -- in the case of an ambiguous grammar, it is often the case that the "subset" is in fact the entire language, and the default resolution trims off the ambiguous case. For cases that require more lookahead, however, as well as some ambiguous cases, the default resolution will result in failing to parse some things in the language.
To figure out what is going wrong on any given conflict, the .output
file generally gives you all you need. In your case, you have 3 state with conflicts -- generally the conflicts in a single state are all a single related issue.
state 38
8 word: TIME .
21 log_entry: entry_prefix TIME . body_statements
This conflict is an ambiguity between the rules for log_entry
and body_statements
:
log_entry: entry_prefix body_statements
| entry_prefix TIME body_statements
a body_statements
can be a sequence of one or more TIME
/DATE
/DATA
/IDENTIFIER
tokens, so when you have an input with (eg) entry_prefix TIME DATA
, it could be either the first log_entry
rule with TIME DATA
as the body_statements
or the second log_entry
rule with just DATA
as the body_statements
.
The default resolution in this case will favor the second rule (shifting to treat the TIME
as part of log_statements
rather than reducing it as word
to be part of a body_statements
), and will result in a "subset" that is the entire language -- the only parses that will be missed are ambiguous. This is a case analogous to the "dangling else" that shows up in some languages, where the default shift likely does exactly what you want.
To eliminate this conflict, the easiest way is just to get rid of the log_entry: entry_prefix TIME body_statements
rule. This has the opposite effect of the default resolution -- now TIME will always be considered part of the BODY. The issue is that now
you don't have a separate rule to reduce if you want to treat the case of the being an initial TIME
in the body differently. You can do a check in the action for a body that begins with TIME
if you need to do something special.
state 40
6 statement: statement . word
22 body_statements: statement .
This is another ambiguity problem, this time with section_body
where it can't tell where one log_entry
ends and another begins. A section_body
consists of one or more log_entries
, each of which is an entry_prefix
followed by a body_statements
. The body_statements
as noted above may be one or more word
tokens, while an entry_prefix
may be empty. So if you have a section_body
that is just a sequence of word
tokens, it can be parsed as either a single log_entry
(with no entry_prefix
) or as a sequence of log_entry
rules, each with no entry_prefix
. The default shift over reduce resolution will favor putting as many tokens as possible into a single statement
before reducing a body_statement
, so will parse it as a single log_entry
, which is likely ok.
To eliminate this conflict, you'll need to refactor the grammar. Since the tail of any statement
in a log_entry
might be another log_entry
with no entry_prefix
and statement
for the body_statements
, you pretty much need to eliminate that case (which is what the default conflict resolution does). Assuming you've fixed the previous conflict by removing the second log_entry
rule, first unfactor log_entry
to make the problematic case its own rule:
log_entry: ERROR_TOK body_statements
| WARN_TOK body_statements
| head_desc
initial_log_entry: log_entry
| statements
Now change the section_body
rule so it uses the split-off rule for just the first one:
section_body: initial_log_entry
| section_body log_entry
and the conflict goes away.
state 46
9 word: DATE .
17 time_stamp: statement DATE . TIME
This conflict is a lookahead ambiguity problem -- since DATE
and TIME
tokens can appear in a statement
, when parsing a time_stamp
it can't tell where the statement
ends and the terminal DATE
/TIME
begins. The default resolution will result in treating any DATE TIME
pair seen as the end of the time_stamp
. Now since time_stamp
only appears at the end of a section_head
just before a section_body
, and a section_body
may begin with a statement
, this may be fine as well.
So it may well be the case that all the conflicts in your grammar are ignorable, and it may even be desirable to do so, as that would be simpler than rewriting the grammar to get rid of them. On the other hand, the presence of conflicts makes it tougher to modify the grammar, since any time you do, you need to reexamine all of the conflicts to make sure they are still benign.
There's a confusing issue with "the default resolution of a conflict" and "the default action in a state". These two defaults have nothing to do with one another -- the first is a decision made by yacc/bison when it builds the parser, and the second is a decision by the parser at runtime. So when you have a state in the output file like:
state 46
9 word: DATE .
17 time_stamp: statement DATE . TIME
TIME shift, and go to state 48
TIME [reduce using rule 9 (word)]
$default reduce using rule 9 (word)
This is telling you that bison has determined that the possible actions from this state are to either shift to state 48 or reduce rule 9. The shift action is only possible if the lookahead token is TIME
, while the reduce action is possible for many lookahead tokens including time. So in the interest of minimizing table size, rather than enumerating all the possible next tokens for the reduce action, it just says $default
-- meaning the parser will do the reduce action as long as no previous action matches the lookahead token. The $default
action will always be the last action in the state.
So the parser's code in this state will run down the list of actions until it finds one that matches the lookahead token and does that action. The TIME [reduce
... action is only included to make it clear that there was a conflict in this state and that bison resolved it by disallowing the action reduce
when the lookahead is TIME
. So the actual action table constructed for this state will have just a single action (shift on token TIME
) followed by a default action (reduce on any token).
Note that it does this despite the fact that not all tokens are legal after a word
, it still does the reduction on any token. That's because even if the next token is something illegal, since the reduction doesn't read it from the input (it's still the lookahead), a later state (after potentially multiple default reductions) will see (and report) the error.
Upvotes: 7