Nedeljko
Nedeljko

Reputation: 35

Porting C bison parser to C++

I use reentrant FLEX C based scanner and reentrant bison C based parser. It works fine.

I wish to keep reentrant C based FLEX scanner and to get reentrant bison C++ based parser.

Why? When the input is too complex (too long if-else-if chain), then limitation about bison's stack size is problem. Also, I dislike undocumented solutions and solutions where authors says that something will be changed and so on. This is why I wish to keep C based FLEX scanner.

Also, I need to use specific prefixes or suffixes or namespaces to use multiple reentrant FLEX C based scanners and reentrant bison C++ based parsers.

Thanx

Upvotes: 0

Views: 608

Answers (2)

rici
rici

Reputation: 241671

You're under no obligation to use flex's C++ template in a C++ program. The C code generated by flex should compile and execute correctly in C++. (At least, I haven't seen a problem.) As indicated in the flex documentation, any C++ incompatibility in the generated C code will be treated as a bug and should be reported as such.

Having said that, if your only concern with bison's C template is the limitation on parser stack size, you can easily increase the limit by defining YYMAXDEPTH. In particular, you can essentially avoid any limit by using:

#define YYMAXDEPTH (YYSTACK_ALLOC_MAXIMUM / YYSTACK_BYTES (1))

(That formula was derived from a comment in the skeleton data/yacc.c:

/* YYMAXDEPTH -- maximum size the stacks can grow to (effective only
   if the built-in stack extension method is used).

   Do not make this value too large; the results are undefined if
   YYSTACK_ALLOC_MAXIMUM < YYSTACK_BYTES (YYMAXDEPTH)
   evaluated with infinite-precision integer arithmetic.  */

I don't know if that comment really constitutes formal documentation, but it seems to be addressed to people who might consider changing the value of YYMAXDEPTH so I'm assuming that it's OK to rely on it. :-) ) On the other hand, you might well consider imposing some smaller limit, since just leaving it to malloc() to report allocation failure -- which is the result of the above #define -- is notoriously unreliable on platforms which use optimistic memory allocation.

The issue with optimistic memory allocation is not avoided by switching to C++, since bison's C++ template relies on std::vector to report memory allocation failure, and none of the standard library implementations I know of attempt to predict future unavailability of optimistic allocated memory.

Upvotes: 0

user6889435
user6889435

Reputation:

Assuming you want to get your hands on writing parsers in C++ using Flex & Bison, here comes the answer.

This article provides answer to your question. The author is using Flex and Bison in C++ freely, in OO manner. In case of Flex, you can make use of %{ ... %} code blocks. On the other hand, in case of Bison, that's how really, really generic parser would look in C++ according to author:

%skeleton "lalr1.cc"
%require  "3.0"
%debug 
%defines 
%define api.namespace {MC}
%define parser_class_name {MC_Parser}

%code requires{
   namespace MC {
      class MC_Driver;
      class MC_Scanner;
   }

// The following definitions is missing when %locations isn't used
# ifndef YY_NULLPTR
#  if defined __cplusplus && 201103L <= __cplusplus
#   define YY_NULLPTR nullptr
#  else
#   define YY_NULLPTR 0
#  endif
# endif

}

%parse-param { MC_Scanner  &scanner  }
%parse-param { MC_Driver  &driver  }

%code{
   #include <iostream>
   #include <cstdlib>
   #include <fstream>

   /* include for all driver functions */
   #include "mc_driver.hpp"

#undef yylex
#define yylex scanner.yylex
}

%define api.value.type variant
%define parse.assert

%token               END    0     "end of file"
%token               UPPER
%token               LOWER
%token <std::string> WORD
%token               NEWLINE
%token               CHAR

%locations

%%

list_option : END | list END;

list
  : item
  | list item
  ;

item
  : UPPER   { driver.add_upper(); }
  | LOWER   { driver.add_lower(); }
  | WORD    { driver.add_word( $1 ); }
  | NEWLINE { driver.add_newline(); }
  | CHAR    { driver.add_char(); }
  ;

%%


void 
MC::MC_Parser::error( const location_type &l, const std::string &err_message )
{
   std::cerr << "Error: " << err_message << " at " << l << "\n";
} 

... and the driver code:

#include <cctype>
#include <fstream>
#include <cassert>

#include "mc_driver.hpp"

MC::MC_Driver::~MC_Driver()
{
   delete(scanner);
   scanner = nullptr;
   delete(parser);
   parser = nullptr;
}

void 
MC::MC_Driver::parse( const char * const filename )
{
   /**
    * Remember, if you want to have checks in release mode
    * then this needs to be an if statement 
    */
   assert( filename != nullptr );
   std::ifstream in_file( filename );
   if( ! in_file.good() )
   {
       exit( EXIT_FAILURE );
   }
   parse_helper( in_file );
   return;
}

void
MC::MC_Driver::parse( std::istream &stream )
{
   if( ! stream.good()  && stream.eof() )
   {
       return;
   }
   //else
   parse_helper( stream ); 
   return;
}


void 
MC::MC_Driver::parse_helper( std::istream &stream )
{

   delete(scanner);
   try
   {
      scanner = new MC::MC_Scanner( &stream );
   }
   catch( std::bad_alloc &ba )
   {
      std::cerr << "Failed to allocate scanner: (" <<
         ba.what() << "), exiting!!\n";
      exit( EXIT_FAILURE );
   }

   delete(parser); 
   try
   {
      parser = new MC::MC_Parser( (*scanner) /* scanner */, 
                                  (*this) /* driver */ );
   }
   catch( std::bad_alloc &ba )
   {
      std::cerr << "Failed to allocate parser: (" << 
         ba.what() << "), exiting!!\n";
      exit( EXIT_FAILURE );
   }
   const int accept( 0 );
   if( parser->parse() != accept )
   {
      std::cerr << "Parse failed!!\n";
   }
   return;
}

void 
MC::MC_Driver::add_upper()
{ 
   uppercase++; 
   chars++; 
   words++; 
}

void 
MC::MC_Driver::add_lower()
{ 
   lowercase++; 
   chars++; 
   words++; 
}

void 
MC::MC_Driver::add_word( const std::string &word )
{
   words++; 
   chars += word.length();
   for(const char &c : word ){
      if( islower( c ) )
      { 
         lowercase++; 
      }
      else if ( isupper( c ) ) 
      { 
         uppercase++; 
      }
   }
}

void 
MC::MC_Driver::add_newline()
{ 
   lines++; 
   chars++; 
}

void 
MC::MC_Driver::add_char()
{ 
   chars++; 
}


std::ostream& 
MC::MC_Driver::print( std::ostream &stream )
{
   /** NOTE: Colors are defined as class variables w/in MC_Driver **/
   stream << red  << "Results: " << norm << "\n";
   stream << blue << "Uppercase: " << norm << uppercase << "\n";
   stream << blue << "Lowercase: " << norm << lowercase << "\n";
   stream << blue << "Lines: " << norm << lines << "\n";
   stream << blue << "Words: " << norm << words << "\n";
   stream << blue << "Characters: " << norm << chars << "\n";
   return(stream);
}

Much better explaination (compared to just comments above) of this code can be found on authors website linked two paragraphs above. I really advice you to check it out.

Upvotes: 1

Related Questions