paulotorrens
paulotorrens

Reputation: 2321

Solving bison conflict over 2nd lookahead

I'm writing a parser for a project and got stuck on an issue. Here's a self contained example of the problem:

%error-verbose

%token ID
%token VAR
%token END_VAR
%token CONSTANT
%token AT
%token VALUE
%%

unit: regular_var_decl
    | direct_var_decl;

regular_var_decl: VAR constant_opt ID ':' VALUE ';' END_VAR;

constant_opt: /* empty */ | CONSTANT;

direct_var_decl: VAR ID AT VALUE ':' VALUE ';' END_VAR;

%%
#include <stdlib.h>
#include <stdio.h>

yylex() {
  static int i = 0;

  static int tokens[] = {
    VAR,
      ID, ':', VALUE, ';',
    END_VAR,
    0
  };

  return tokens[i++];
};

yyerror(str) char *str; {
  printf("fail: %s\n", str);
};

main() {
  yyparse();
  return 0;
};

One could build it bison test.y && cc test.tab.c && ./a.out.

It warns me that constant_opt is useless due to conflicts.

This ambiguity could be solved by using LALR(2), since after ID it could find ':' or AT... How could I solve this issue on bison?

Upvotes: 0

Views: 508

Answers (2)

Chris Dodd
Chris Dodd

Reputation: 126408

Another solution is to factor it further:

var_decl: VAR constant_opt ID direct_opt ':' VALUE ';' END_VAR;

constant_opt: /* empty */ | CONSTANT;

direct_opt: /* empty */ | AT VALUE;

Then in your action for var_decl, you decide if it's regular, constant, or direct, or issue an error if it has both CONSTANT and AT VALUE. This has the advantage that you can give a custom, clear error message for the latter case, rather than just a generic "syntax error" message.

Upvotes: 1

rici
rici

Reputation: 241861

A simple solution is to just not abbreviate the optional CONSTANT:

regular_var_decl:  VAR ID ':' VALUE ';' END_VAR;
constant_var_decl: VAR CONSTANT ID ':' VALUE ';' END_VAR;
direct_var_decl:   VAR ID AT VALUE ':' VALUE ';' END_VAR;

That allows the reduction decision to be deferred until enough information is known. (You could factor ':' VALUE ';' END_VAR into a non-terminal if that were useful.)

Another possibility is leave the grammar as it was, and ask bison to produce a GLR parser (%glr-parser). The GLR parser will effectively retain two (or more) parallel parses until the ambiguity can be resolved, which should certainly fix the constant_opt problem. (Note that the shift/reduce conflicts are still reported by bison; it cannot tell whether the language is actually ambiguous until an ambiguous sentence is discovered at runtime.) Much of the time, no additional change to the grammar needs to be made, but it does slow the parse down a little bit.

A final possibility, probably less useful here, is to accept a superset of the language and then issue an error message in an action:

var_decl: VAR const_opt ID at_value_opt ':' VALUE ';' END_VAR {
   if (/* pseudocode */ $2 && $4) { /* flag a syntax error */ }
}

That depends on the two opt terminals returning a semantic value which can be interrogated somehow for empty.

Upvotes: 1

Related Questions