Dr.Kameleon
Dr.Kameleon

Reputation: 22820

Precedence issue with Bison

OK, so I'm experimenting a bit with a parser for an... experimental language.

The main idea is, this:

this is a command
this is [another] command

should output this:

   push: command
   push: a
   push: is
   push: this
   push: command
   <start:inline>
   push: another
   <end:inline>
   push: is
   push: this

Here's my lexer:

%option noyywrap

%{
#include <stdio.h>

#define YY_DECL int yylex()

#include "parser3.tab.h"

%}

%%

":"                     { return COLON; }
,(\n+)?                 { return COMMA; }

\[                      { return INLINE_START; }
\]                      { return INLINE_END; }
\{                      { return DEFERRED_START; }
\}?                     { return DEFERRED_END; }
\n|\|                   { return NL; }
[ \t]+                  { }
[a-zA-Z0-9!@#$%^&\*\-\+\/\\\"\'\(\)]+   { yylval.str=strdup(yytext); return ID; }

%%

And here is my parser:

%{

#include <stdio.h>
#include <stdlib.h>

extern int yylex();
extern int yyparse();
extern FILE* yyin;

void yyerror(const char* s);
%}

%union {
    char* str;
}

%token NL 
%token <str> ID
%token COLON
%token BAR
%token COMMA
%token <str> INLINE_START INLINE_END
%token <str> DEFERRED_START DEFERRED_END

%type <str> argument
%type <str> block_start block_end

/****************************************
 Options
 ****************************************/

%glr-parser
%define parse.error verbose
%locations

%start program

%%

block_start : INLINE_START                  { printf("<start:inline>\n"); }
            | DEFERRED_START                { printf("found START\n"); }
            ;

block_end : INLINE_END                      { printf("<end:inline>\n"); }
          | DEFERRED_END                    { printf("found END\n"); }
          ;

block : block_start statement_list block_end { /*printf("found block\n");*/ }
      ;

argument : ID                               
         | block                            { $$ = ""; }
         | COMMA                            { $$ = ""; }                    
         ;

command : argument                          { if (strcmp($argument,"")) printf("  push: \033[1;36m%s\033[0m\n", $argument); }
        | argument command                  { if (strcmp($argument,"")) printf("  push: \033[1;36m%s\033[0m\n", $argument); }
        ;

label : ID COLON command                    { printf("> store: \033[1;36m%s\033[0m\n", $ID); }
      ;

statement_separator : NL
                    ;

statement : command
          | label
          ;

statement_list : statement
               | statement_list statement_separator statement
               | statement_list statement_separator
               ;

program : statement_list
        ;

%%

int main(int argc, char** argv) {
    yyin = fopen(argv[1],"r");

    printf("parsing: %s\n",argv[1]);

    do {
        yyparse();
    } while(!feof(yyin));

    return 0;
}

void yyerror(const char* s) {
    fprintf(stderr, "Parse error: %s\n", s);
    exit(1);
}

Currently the output is:

parsing: test6.art
  push: command
  push: a
  push: is
  push: this
<start:inline>
  push: another
<end:inline>
  push: command
  push: is
  push: this

Any ideas how I can force it to see "command" and "is" tokens first (in this specific order) in the last command and then the block?

Upvotes: 0

Views: 124

Answers (1)

rici
rici

Reputation: 241911

Nothing in bison, not even a GLR parser, can make actions execute out of order. The action for a production is performed precisely when that production is recognised, no sooner and no later.

It's well known that you can reverse list actions by using a right-recursive list:

list_of_things: thing                { show(thing); }
              | thing list_of_things { show(thing); }

But note that even though the action seems to be about printing thing, the action is the action for list_of_things, and the reason that the things are printed in reverse order is that they are being added to the list in reverse order, not that they are being recognised in reverse order. In other words, if you printed the thing in the reduction action for thing itself, you would see the things printed in original order, although they would still be added to the list in reverse order (since adding to the list is in the list action).

So in your grammar, the arguments to command are being printed by the command action, unless they are blocks; if they are blocks, they are printed in the block actions. That causes the printout to be scrambled: All the blocks are printed first, in left-to-right order, because those reductions take place before the first command reduction; when the accumulated stack of command productions finally start being reduced, the other arguments are then printed in right-to-left order.

I don't think there is a simple "Change this and it will magically work" solution here. Personally, I'd build a linked list of arguments, including the block arguments. In effect, that would be a kind of AST, conforming to the usual advice that building ASTs during a parse is almost always the right thing to do, even though in the beginning it seems like taking a short-cut is appealing.

There is also the alternative of building up the string to be printed in the block actions, so that the semantic value of the block can be set to the string which should eventually be printed (instead of the empty string). Then the action for command could just print the argument. But that seems to me a lot clunkier than building the AST.

Upvotes: 3

Related Questions