slavasav
slavasav

Reputation: 71

Bison, C++ GLR parsing: how to force shift\reduce conflict?

How can I force the shift\reduce conflict to be resolved by the GLR method?
Suppose I want the parser to resolve the conflict between the right shift operator and two closing angle brackets of template arguments for itself. I make the lexer pass the 2 consecutive ">" symbols, as separate tokens, without merging them into one single ">>" token. Then i put these rules to the grammar:

operator_name:  
     "operator" ">"  
   | "operator" ">" ">"  
;  

I want this to be a shift\reduce conflict. If I have the token declaration for ">" with left associativity, this will not be a conflict. So I have to remove the token precedence\associativity declaration, but this results in many other conflicts that I don't want to solve manually by specifying the contextual precedence for each conflicting rule. So, is there a way to force the shift\reduce conflict while having the token declared?

Upvotes: 5

Views: 1526

Answers (3)

Jarek C
Jarek C

Reputation: 1261

I have solved a similar problem (in my own proprietary language) and my solution was in cooperation between the scanner and the parser.

Entering a template argument list was notified by templListLevel++ and leaving the list by templListLevel--. For example, input:

null<setof<int>>

worked as:

"null" "<" {level++} "setof" "<" {level++} "int" ">" {level--} ">" {level--}

The same templListLevel is accessible to the scanner and if templListLevel>0, the scanner does not recognize >> (the sequence is scanned as two >).

In flex it could look like this:

">>"      { if (!templListLevel) return TOKEN_SHIFTR; unput('>'); return TOKEN_GT; }

Similarly, you need to handle C++'s >= and >>= operators.

This is a representation of idea "when in a template argument list, two consecutive > is not a right-shift operator but two "greater than" tokens".

Then, e.g., a sequence null<int>>>0 is correcly scanned as null, <, setof, <, int, > (leaving templ arg list), >>, 0 (which is non-sense, but just for illustration).

Btw, splitting >> operator to be parsed as two > > tokens is, in my opinion, a wrong way. Because then, > {space} > would be parsed as right-shift operator which should be wrong. Next, you will probably get shift-reduce conflicts between greater than operator > and right-shift operator >> represented by > > (but not sure now).

My language is not using expressions as template arguments, so I didn't need to solve the problem of appearing right-shift operator in argument list, but for C++ parser, entering an expression within template arguments in parser probably need to be noticed by another flag again re-activating recognition of >> (and leaving that expression again re-activating template argument list mode).

Also, I'm not sure if flex start conditions (https://ftp.gnu.org/old-gnu/Manuals/flex-2.5.4/html_mono/flex.html#SEC11) could not be used instead using user-defined nesting counter. I have already other similar hacks between scanner and parser, so I simply add this one to the others and didn't consider option of start conditions.

Also, I did not analyze C++ grammar as Tavian is suggesting (https://stackoverflow.com/a/9099999/3736444), that would be helpful, but I needed to solve a different problem in a different language.

Upvotes: 0

umlcat
umlcat

Reputation: 4143

I worked in Yacc (similar to Bison), with a similar scenario.

Standard grammars are, sometimes, called "parsing directed by syntax".

This case is, sometimes, called something like "parsing directed by semantics".

Example:

...
// shift operator example
if ((x >> 2) == 0)
...
// consecutive template closing tag example
List<String, List<String>> MyList =
...

Lets remember, our mind works like a compiler. The human mind can compile this, but the previous grammars, can't. Mhhh. Lets see how a human mind, would compile this code.

As you already know, the "x" before the consecutive ">" and ">" tokens indicates an expression or lvalue. The mind thinks "two consecutive greater-than symbols, after an expresion, should become a single shift operator token".

And for the "string" token: "two consecutive greater-than symbols, after a type identifier, should become two consecutive template closing tag tokens".

I think this case cannot be handled by the usual operator precedence, shift or reduce, or just grammars, but using ( "hacking" ) some functions provided by the parser itself.

I don't see an error in your example grammar rule. The "operator" symbol avoids confusing the two cases you mention. The parts that should be concern its the grammars where the shift operator its used, and the consecutive template closing tags are used.

operator_expr_example:  
  lvalue "<<"  lvalue |  
  lvalue ">>"  lvalue |
  lvalue "&&"  lvalue |
;  

template_params:  
  identifier |
  template_declaration_example |
  array_declaration |
  other_type_declaration 
;  

template_declaration_example:  
  identifier "<"  template_params ">"
;  

Cheers.

Upvotes: 1

Tavian Barnes
Tavian Barnes

Reputation: 12922

I believe that using context-dependent precedence on the rules for operator_name will work.

The C++ grammar as specified by the updated standard actually modifies the grammar to accept the >> token as closing two open template declarations. I'd recommend following it to get standard behaviour. For example, you must be careful that "x > > y" is not parsed as "x >> y", and you must also ensure that "foo<bar<2 >> 1>>" is invalid, while "foo<bar<(2 >> 1)>>" is valid.

Upvotes: 2

Related Questions