linrongbin
linrongbin

Reputation: 3361

Bison c++ variant with flex reentrant c++14 example?

I'm trying to write my toy language with flex/bison tool chain in c++14.

I'm confused when using bison c++ variant with flex reentrant, yylex cannot find the parameter yylval.

My developing environment is the macbook with latest OS and XCode, homebrew installed latest flex 2.6.4 and bison 3.7.1.

For convience, you could download the project with error here: https://github.com/linrongbin16/tree.

Now let me introduce this not-so-simple tree project:

First let's see the makefile

clean:
    rm *.o *.out *.yy.cc *.yy.hh *.tab.cc *.tab.hh *.output

tree.out: tree.o token.yy.o parser.tab.o
    clang++ -std=c++14 -o tree.out tree.o token.yy.o parser.tab.o

token.yy.cc token.yy.hh: token.l
    flex --debug -o token.yy.cc --header-file=token.yy.hh token.l

parser.tab.cc parser.tab.hh: parser.y
    bison --debug --verbose -Wcounterexamples -o parser.tab.cc --defines=parser.tab.hh parser.y

token.yy.o: token.yy.cc
    clang++ -std=c++14 -g -c token.yy.cc token.yy.hh

parser.tab.o: parser.tab.cc
    clang++ -std=c++14 -g -c parser.tab.cc parser.tab.hh

tree.o: tree.cpp parser.tab.hh token.yy.hh
    clang++ -std=c++14 -g -c tree.cpp

The application is a tree.out, which depends on 3 components: tree token and parser.

tree component

tree.h defines a simple abstract syntax tree class, since I didn't implement it, it has only one virtual destructor:

#pragma once

struct Tree {
  virtual ~Tree() = default;
};

tree.cpp is the main function, which read a filename from stdin and initialize lexer and parser, and do the parsing:

#include "parser.tab.hh"
#include "token.yy.hh"
#include <cstdio>
#include <cstdlib>

struct Scanner {
  yyscan_t yyscanner;
  FILE *fp;
  YY_BUFFER_STATE yyBufferState;

  Scanner(const char *fileName) {
    yylex_init_extra(this, &yyscanner);
    fp = std::fopen(fileName, "r");
    if (!fp) {
      printf("file %s cannot open!\n", fileName);
      exit(-1);
    }
    yyBufferState = yy_create_buffer(fp, YY_BUF_SIZE, yyscanner);
    yy_switch_to_buffer(yyBufferState, yyscanner);
    yyset_lineno(1, yyscanner);
  }

  virtual ~Scanner() {
    if (yyBufferState) {
      yy_delete_buffer(yyBufferState, yyscanner);
    }
    if (yyscanner) {
      yylex_destroy(yyscanner);
    }
    if (fp) {
      std::fclose(fp);
    }
  }
};

int main(int argc, char **argv) {
  if (argc != 2) {
    printf("missing file name!\n");
    return -1;
  }

  Scanner scanner(argv[1]);
  yy::parser parser(scanner.yyscanner);
  if (parser.parse() != 0) {
    printf("parsing failed!\n");
    return -1;
  }
  return 0;
}

The important thing is that, I use bison c++ variant and flex reentrant feature, I want to make the project modern (with c++ 14) and safe with multiple threading. So it's a little complex when initializing. But it's worthy when project expand to a big one.

lexer component

token.l:

%option noyywrap noinput nounput
%option nodefault
%option nounistd
%option reentrant

%{
#include <cstdio>
#include <cstring>
#include "parser.tab.hh"
%}

%%

"+"     { yylval->emplace<int>(yy::parser::token::PLUS); return yy::parser::token::PLUS; }
"-"     { yylval->emplace<int>(yy::parser::token::MINUS); return yy::parser::token::MINUS; }
"*"     { yylval->emplace<int>(yy::parser::token::TIMES); return yy::parser::token::TIMES; }
"/"     { yylval->emplace<int>(yy::parser::token::DIVIDE); return yy::parser::token::DIVIDE; }
"("     { yylval->emplace<int>(yy::parser::token::LPAREN); return yy::parser::token::LPAREN; }
")"     { yylval->emplace<int>(yy::parser::token::RPAREN); return yy::parser::token::RPAREN; }
";"     { yylval->emplace<int>(yy::parser::token::SEMICOLON); return yy::parser::token::SEMICOLON; }
"="     { yylval->emplace<int>(yy::parser::token::EQUAL); return yy::parser::token::EQUAL; }

[a-zA-Z][a-zA-Z0-9]+    { yylval->emplace<std::string>(yytext); return yy::parser::token::ID; }
[0-9]+                  { yylval->emplace<int>(atoi(yytext)); return yy::parser::token::NUM; }

%%

Here I followed bison split symbol manual (NOTICE: here we got the compiling error, I also tried the make_XXX api, which also gives me error).

It generates token.yy.cc token.yy.hh, expect to compile a token.yy.o object.

parser component

parser.y:

%require "3.2"
%language "c++"
%define api.value.type variant
%define api.token.constructor
%define parse.assert
%define parse.error verbose
%define parse.lac full
%locations
%param {yyscan_t yyscanner}

%code top {
#include <memory>
}

%code requires {
#include <memory>
#include "token.yy.hh"
#include "tree.h"
#define SP_NULL (std::shared<Tree>(nullptr))
}

%token<int> PLUS '+'
%token<int> MINUS '-'
%token<int> TIMES '*'
%token<int> DIVIDE '/'
%token<int> SEMICOLON ';'
%token<int> EQUAL '='
%token<int> LPAREN '('
%token<int> RPAREN ')'
%token<int> NUM

%token<std::string> ID

%type<std::shared_ptr<Tree>> prog assign expr literal

/* operator precedence */
%right EQUAL
%left PLUS MINUS
%left TIMES DIVIDE


%start prog

%%

prog : assign { $$ = SP_NULL; }
     | prog ';' assign { $$ = SP_NULL }
     ;

assign : ID '=' expr { $$ = SP_NULL; }
       | expr { $$ = $1; }
       ;

expr : literal { $$ = SP_NULL; }
     | expr '+' literal { $$ = SP_NULL; }
     | expr '-' literal { $$ = SP_NULL; }
     | expr '*' literal { $$ = SP_NULL; }
     | expr '/' literal { $$ = SP_NULL; }
     ;

literal : ID { $$ = SP_NULL; }
        | NUM { $$ = SP_NULL; }
        ;

%%

I followed the bison c++ variant manual, it generates parser.tab.cc parser.tab.hh parser.output, the output file is just for analysis.

Since flex is reentrant, I need to add a parameter %param {yyscan_t yyscanner}.

error message

Here's the error message when making with make tree.out:

bison --debug --verbose -Wcounterexamples -o parser.tab.cc --defines=parser.tab.hh parser.y
flex --debug -o token.yy.cc --header-file=token.yy.hh token.l
clang++ -std=c++14 -g -c tree.cpp
clang++ -std=c++14 -g -c token.yy.cc token.yy.hh
token.yy.cc:820:10: error: use of undeclared identifier 'yyin'; did you mean 'yyg'?
                if ( ! yyin )
                       ^~~~
                       yyg
token.yy.cc:807:23: note: 'yyg' declared here
    struct yyguts_t * yyg = (struct yyguts_t*)yyscanner;
                      ^
token.yy.cc:822:4: error: use of undeclared identifier 'yyin'
                        yyin = stdin;
                        ^
token.yy.cc:827:10: error: use of undeclared identifier 'yyout'
                if ( ! yyout )
                       ^
token.yy.cc:829:4: error: use of undeclared identifier 'yyout'
                        yyout = stdout;
                        ^
token.yy.cc:837:23: error: use of undeclared identifier 'yyin'
                                yy_create_buffer( yyin, YY_BUF_SIZE , yyscanner);
                                                  ^
token.yy.cc:895:3: error: use of undeclared identifier 'YY_DO_BEFORE_ACTION'
                YY_DO_BEFORE_ACTION;
                ^
token.yy.cc:902:8: error: use of undeclared identifier 'yy_flex_debug'; did you mean 'yyget_debug'?
                if ( yy_flex_debug )
                     ^~~~~~~~~~~~~
                     yyget_debug
token.yy.cc:598:5: note: 'yyget_debug' declared here
int yyget_debug ( yyscan_t yyscanner );
    ^
token.yy.cc:908:45: error: use of undeclared identifier 'yytext'
                                         (long)yy_rule_linenum[yy_act], yytext );
                                                                        ^
token.yy.cc:911:14: error: use of undeclared identifier 'yytext'
                                         yytext );
                                         ^
token.l:12:3: error: use of undeclared identifier 'yylval'
{ yylval->emplace<int>(yy::parser::token::PLUS); return yy::parser::token::PLUS; }
  ^
token.l:13:3: error: use of undeclared identifier 'yylval'
{ yylval->emplace<int>(yy::parser::token::MINUS); return yy::parser::token::MINUS; }
  ^
token.l:14:3: error: use of undeclared identifier 'yylval'
{ yylval->emplace<int>(yy::parser::token::TIMES); return yy::parser::token::TIMES; }
  ^
token.l:15:3: error: use of undeclared identifier 'yylval'
{ yylval->emplace<int>(yy::parser::token::DIVIDE); return yy::parser::token::DIVIDE; }
  ^
token.l:16:3: error: use of undeclared identifier 'yylval'
{ yylval->emplace<int>(yy::parser::token::LPAREN); return yy::parser::token::LPAREN; }
  ^
token.l:17:3: error: use of undeclared identifier 'yylval'
{ yylval->emplace<int>(yy::parser::token::RPAREN); return yy::parser::token::RPAREN; }
  ^
token.l:18:3: error: use of undeclared identifier 'yylval'
{ yylval->emplace<int>(yy::parser::token::SEMICOLON); return yy::parser::token::SEMICOLON; }
  ^
token.l:19:3: error: use of undeclared identifier 'yylval'
{ yylval->emplace<int>(yy::parser::token::EQUAL); return yy::parser::token::EQUAL; }
  ^
token.l:21:3: error: use of undeclared identifier 'yylval'
{ yylval->emplace<std::string>(yytext); return yy::parser::token::ID; }
  ^
token.l:21:32: error: use of undeclared identifier 'yytext'
{ yylval->emplace<std::string>(yytext); return yy::parser::token::ID; }
                               ^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated.
make: *** [token.yy.o] Error 1

Would you please help me solve these issues ?

Upvotes: 1

Views: 1366

Answers (1)

linrongbin
linrongbin

Reputation: 3361

Well, I read bison manual again and solve the issue myself...

Here in bison c++ example, we could see the yylex declaration is redefined:

// Give Flex the prototype of yylex we want ...
# define YY_DECL \
  yy::parser::symbol_type yylex (driver& drv)
// ... and declare it for the parser's sake.
YY_DECL;

That's why we could write some like below in flex rule:

return yy::parser::make_MINUS  (loc);

Upvotes: 3

Related Questions