Fix shift/reduce conflict in bison grammar

Question

The bison grammar I wrote for parsing a text file gives me 10 shift/reduce conflicts. The parser.output file doesn't help me enough. The file gives me information as:

State 38 conflicts: 5 shift/reduce
State 40 conflicts: 4 shift/reduce
State 46 conflicts: 1 shift/reduce

Grammar

0 $accept: session $end

1 session: boot_data section_start

2 boot_data: head_desc statement head_desc head_desc

3 head_desc: OPEN_TOK BOOT_TOK statement CLOSE_TOK
4          | OPEN_TOK statement CLOSE_TOK

5 statement: word
6          | statement word

7 word: IDENTIFIER
8     | TIME
9     | DATE
10     | DATA

11 section_start: section_details
12              | section_start section_details
13              | section_start head_desc section_details

14 $@1: /* empty */

15 section_details: $@1 section_head section_body section_end

16 section_head: START_TOK head_desc START_TOK time_stamp

17 time_stamp: statement DATE TIME

18 section_body: log_entry
19             | section_body log_entry

20 log_entry: entry_prefix body_statements
21          | entry_prefix TIME body_statements

22 body_statements: statement
23                | head_desc

24 entry_prefix: ERROR_TOK
25             | WARN_TOK
26             | /* empty */

27 $@2: /* empty */

28 section_end: END_TOK statement $@2 END_TOK head_desc

 state 38

 8 word: TIME .
21 log_entry: entry_prefix TIME . body_statements

 OPEN_TOK    shift, and go to state 1
 TIME        shift, and go to state 6
 DATE        shift, and go to state 7
 DATA        shift, and go to state 8
 IDENTIFIER  shift, and go to state 9

 OPEN_TOK    [reduce using rule 8 (word)]
 TIME        [reduce using rule 8 (word)]
 DATE        [reduce using rule 8 (word)]
 DATA        [reduce using rule 8 (word)]
 IDENTIFIER  [reduce using rule 8 (word)]
 $default    reduce using rule 8 (word)

 head_desc        go to state 39
 statement        go to state 40
 word             go to state 11
 body_statements  go to state 45


state 39

23 body_statements: head_desc .

$default  reduce using rule 23 (body_statements)


state 40

6 statement: statement . word
22 body_statements: statement .

TIME        shift, and go to state 6
DATE        shift, and go to state 7
DATA        shift, and go to state 8
IDENTIFIER  shift, and go to state 9

TIME        [reduce using rule 22 (body_statements)]
DATE        [reduce using rule 22 (body_statements)]
DATA        [reduce using rule 22 (body_statements)]
IDENTIFIER  [reduce using rule 22 (body_statements)]
$default    reduce using rule 22 (body_statements)

word  go to state 19

state 46

9 word: DATE .
17 time_stamp: statement DATE . TIME

TIME  shift, and go to state 48

TIME      [reduce using rule 9 (word)]
$default  reduce using rule 9 (word)

The equivalent part of my grammar is:

statement : word
    {
        printf("WORD
");
        $$=$1;
    }
    |statement word
    {
        printf("STATEMENTS
");
        $$=$1;
        printf("STATEMENT VALUE== %s

",$$);
    }
    ;

 word : IDENTIFIER
    {
        printf("IDENTIFIER
");
        $$=$1;
    }
    |TIME
    {
        printf("TIME
");
        $$=$1;
    }
    |DATE
    {
        printf("DATE
");
        $$=$1;
    }
    |DATA
    {
    }
    ;
section_start : section_details 
    {
        printf("SINGLE SECTIONS
");        
    }
    |section_start section_details
    {
        printf("MULTIPLE SECTIONS
");   
    }
    |section_start head_desc section_details
    ;

section_details :
        {
             fprintf(fp,"
%d:
",set_count); 
        }
         section_head section_body section_end
         {
            printf("SECTION DETAILS
");
             set_count++;

        }
         ;

section_head : START_TOK head_desc START_TOK statement time_stamp
         {
            printf("SECTION HEAD...

%s===

%s
",$2,$4);
            fprintf(fp,"%s
",$4);

         }
         ;
time_stamp : DATET TIME
    {

    }
    ;
section_body :log_entry
         {

         }
        |section_body log_entry
         {

         }
         ;

log_entry : entry_prefix body_statements
         {

         }
         |entry_prefix TIME body_statements
         {

        }
        ;

body_statements : statement
        {

        }
        |head_desc
        {

        }
        ;

Please help me to fix this..

Thanks

Chris Dodd · Accepted Answer

A conflict in a yacc/bison parser means that the grammar is not LALR(1) -- which generally means that something is either ambiguous or needs more than 1 token of lookahead. The default resolution is to choose always shifting over reducing, or choose always reducing the first rule (for reduce/reduce conflicts), which results in a parser that will recognize a subset of the language described by the grammar. That may or may not be ok -- in the case of an ambiguous grammar, it is often the case that the "subset" is in fact the entire language, and the default resolution trims off the ambiguous case. For cases that require more lookahead, however, as well as some ambiguous cases, the default resolution will result in failing to parse some things in the language.

To figure out what is going wrong on any given conflict, the .output file generally gives you all you need. In your case, you have 3 state with conflicts -- generally the conflicts in a single state are all a single related issue.

state 38
 8 word: TIME .
21 log_entry: entry_prefix TIME . body_statements

This conflict is an ambiguity between the rules for log_entry and body_statements:

log_entry: entry_prefix body_statements
         | entry_prefix TIME body_statements

a body_statements can be a sequence of one or more TIME/DATE/DATA/IDENTIFIER tokens, so when you have an input with (eg) entry_prefix TIME DATA, it could be either the first log_entry rule with TIME DATA as the body_statements or the second log_entry rule with just DATA as the body_statements.

The default resolution in this case will favor the second rule (shifting to treat the TIME as part of log_statements rather than reducing it as word to be part of a body_statements), and will result in a "subset" that is the entire language -- the only parses that will be missed are ambiguous. This is a case analogous to the "dangling else" that shows up in some languages, where the default shift likely does exactly what you want.

To eliminate this conflict, the easiest way is just to get rid of the log_entry: entry_prefix TIME body_statements rule. This has the opposite effect of the default resolution -- now TIME will always be considered part of the BODY. The issue is that now you don't have a separate rule to reduce if you want to treat the case of the being an initial TIME in the body differently. You can do a check in the action for a body that begins with TIME if you need to do something special.

state 40
6 statement: statement . word
22 body_statements: statement .

This is another ambiguity problem, this time with section_body where it can't tell where one log_entry ends and another begins. A section_body consists of one or more log_entries, each of which is an entry_prefix followed by a body_statements. The body_statements as noted above may be one or more word tokens, while an entry_prefix may be empty. So if you have a section_body that is just a sequence of word tokens, it can be parsed as either a single log_entry (with no entry_prefix) or as a sequence of log_entry rules, each with no entry_prefix. The default shift over reduce resolution will favor putting as many tokens as possible into a single statement before reducing a body_statement, so will parse it as a single log_entry, which is likely ok.

To eliminate this conflict, you'll need to refactor the grammar. Since the tail of any statement in a log_entry might be another log_entry with no entry_prefix and statement for the body_statements, you pretty much need to eliminate that case (which is what the default conflict resolution does). Assuming you've fixed the previous conflict by removing the second log_entry rule, first unfactor log_entry to make the problematic case its own rule:

log_entry: ERROR_TOK body_statements
         | WARN_TOK body_statements
         | head_desc

initial_log_entry: log_entry
                 | statements

Now change the section_body rule so it uses the split-off rule for just the first one:

section_body: initial_log_entry
            | section_body log_entry

and the conflict goes away.

state 46
9 word: DATE .
17 time_stamp: statement DATE . TIME

This conflict is a lookahead ambiguity problem -- since DATE and TIME tokens can appear in a statement, when parsing a time_stamp it can't tell where the statement ends and the terminal DATE/TIME begins. The default resolution will result in treating any DATE TIME pair seen as the end of the time_stamp. Now since time_stamp only appears at the end of a section_head just before a section_body, and a section_body may begin with a statement, this may be fine as well.

So it may well be the case that all the conflicts in your grammar are ignorable, and it may even be desirable to do so, as that would be simpler than rewriting the grammar to get rid of them. On the other hand, the presence of conflicts makes it tougher to modify the grammar, since any time you do, you need to reexamine all of the conflicts to make sure they are still benign.

There's a confusing issue with "the default resolution of a conflict" and "the default action in a state". These two defaults have nothing to do with one another -- the first is a decision made by yacc/bison when it builds the parser, and the second is a decision by the parser at runtime. So when you have a state in the output file like:

state 46

9 word: DATE .
17 time_stamp: statement DATE . TIME

TIME  shift, and go to state 48

TIME      [reduce using rule 9 (word)]
$default  reduce using rule 9 (word)

This is telling you that bison has determined that the possible actions from this state are to either shift to state 48 or reduce rule 9. The shift action is only possible if the lookahead token is TIME, while the reduce action is possible for many lookahead tokens including time. So in the interest of minimizing table size, rather than enumerating all the possible next tokens for the reduce action, it just says $default -- meaning the parser will do the reduce action as long as no previous action matches the lookahead token. The $default action will always be the last action in the state.

So the parser's code in this state will run down the list of actions until it finds one that matches the lookahead token and does that action. The TIME [reduce... action is only included to make it clear that there was a conflict in this state and that bison resolved it by disallowing the action reduce when the lookahead is TIME. So the actual action table constructed for this state will have just a single action (shift on token TIME) followed by a default action (reduce on any token).

Note that it does this despite the fact that not all tokens are legal after a word, it still does the reduction on any token. That's because even if the next token is something illegal, since the reduction doesn't read it from the input (it's still the lookahead), a later state (after potentially multiple default reductions) will see (and report) the error.

Fix shift/reduce conflict in bison grammar

Answers (1)

Related Questions