Rabidsheep
Rabidsheep

Reputation: 55

Boost.Spirit Grammar. Attributes and _val Questions

I'm attempting to create a Boost::Spirit grammar class that can read a fairly simple grammar.

start   = roster;
roster  = *student;
student = int >> string;

The goal of the code is create a tree of command objects based on an input file that is being parsed. The Iterator that this grammar is being created with is the given spirit file iterator.

Basically, what I am having trouble doing is moving and using the synthesized attributes of each rule. What I need to to create a tree of objects based on this data, and the only functions to create said objects require the parent object to be known at that time. I'm using the command pattern to delay the creation until I have parsed all data and can correctly build the tree. The way I have implemented this so far is my commands all contain a vector of other commands. When a command is executed, it requires only the parent object, and will create and attach the child object accordingly. Then the object will execute each of the commands in it's own vector, passing itself as the parent. This creates the tree structure I need with the data in tact.

The Issue:

The Issue I am having is how to build the commands when the data is parsed, and how to load them into the appropriate vector. I've tried 3 different ways so far.

  1. I tried to alter the attribute of each rule to an std::vector and parse the attributes in as commands one at a time. The issue with this is it nests the vectors into std::vector> type data, which I couldn't work with.

  2. I tried using boost::phoenix placehold _val as a surrogate for the command being created. I was proud of this solution and a bit upset that it didn't work. I overloaded the += operator for all commands so that when A and B are both commands, A += B pushed B into A's command vector. _val isn't a Command so the compiler didn't like this. I couldn't seem to tinker anything into a more workable status. If at all possible, this was the cleanest solution and I would love for this to be able to work.

  3. The code in it's current form has me attempting to bind the actions together. If I were to have a member function pointer to _val and pass it the created command It would push it back. Again _val isn't actually a Command so that didn't work out.

I'm going to post this wall of code, it's the grammar I've written cleaned up a bit, as well as the point where it is invoked.

template <typename Iterator>
struct roster_grammar : qi::grammar<Iterator, qi::space_type, T3_Command()>
{   
//roster_grammar constructor
roster_grammar() : 
    roster_grammar::base_type(start_)
{
    using qi::lit;
    using qi::int_;
    using qi::char_;
    using qi::lexeme;

    start_ = student[boost::bind(&T3_Command::add_command, qi::_val, _1)];

    //I removed the roster for the time being to simplify the grammar
    //it still containes my second solution that I detailed above. This 
    //would be my ideal solution if it could work this way.
    //roster = *(student[qi::_val += _1]);

    student = 
        qi::eps       [ boost::bind(&T3_Command::set_identity, qi::_val, "Student") ]
        >>
        int_age       [ boost::bind(&T3_Command::add_command, qi::_val, _1) ]
        >>
        string_name   [ boost::bind(&T3_Command::add_command, qi::_val, _1) ];

    int_age     =
        int_          [ boost::bind(&Command_Factory::create_int_comm, &cmd_creator, "Age", _1) ];
    string_name =
        string_p      [ boost::bind(&Command_Factory::create_string_comm, &cmd_creator, "Name", _1) ];

    //The string parser. Returns type std::string
    string_p %= +qi::alnum;
}

qi::rule<Iterator,   qi::space_type, T3_Model_Command()>  roster;
qi::rule<Iterator,   qi::space_type, T3_Atom_Command()>   student;
qi::rule<Iterator,   qi::space_type, T3_Int_Command()>    int_age;
qi::rule<Iterator,   qi::space_type, T3_String_Command()> string_name;
qi::rule<Iterator,   qi::space_type, T3_Command()>        start_;
qi::rule<Iterator,   std::string()>  string_p;
Command_Factory cmd_creator;
};

This is how the grammar is being instantiated and used.

typedef boost::spirit::istream_iterator iter_type;
typedef roster_grammar<iter_type> student_p;
student_p my_parser;

//open the target file and wrap istream into the iterator
std::ifstream in = std::ifstream(path);
in.unsetf(std::ios::skipws);//Disable Whitespace Skipping
iter_type begin(in);
iter_type end;

using boost::spirit::qi::space;
using boost::spirit::qi::phrase_parse;
bool r = phrase_parse(begin, end, my_parser, space);

So long story short, I have a grammar that I want to build commands out of (call T3_Command). Commands have a std:Vector data member that holds other commands beneath it in the tree.

What I need is a clean way to create a Command as a semantic action, I need to be able to load that into the vector of other commands (By way of attributes or just straight function calls). Commands have a type that is supposed to be specified at creation (will define the type of tree node it makes) and some commands have a data value (an int, string or float, all named value in their respective commands).

Or If there might be a better way to build a tree, I'd be open to suggestion. Thank you so much for any help you're able to give!

EDIT: I'll try to be more clear about the original problem I'm trying to solve. Thanks for the patience.

Given that grammar (or any grammar actually) I want to be able to parse through it and create a command tree based on the semantic actions taken within the parser. So using my sample grammar, and the input

"23 Bryan 45 Tyler 4 Stephen"

I would like the final tree to result in the following data structure.

Command with type = "Roster" holding 3 "Student" type commands.
Command with type = "Student" each holding an Int_Command and a String_Command
Int_Command holds the stored integer and String_Command the stored string.

E.g.

r1 - Roster - [s1][s2][s3]
s1 - Student - [int 23][string Bryan]
s2 - Student - [int 45][string Tyler]
s3 - Student - [int 4][string Stephen]

This is the current structure of the commands I've written (The implementation is all trivial).

class T3_Command
{
    public:
        T3_Command(void);
        T3_Command(const std::string &type);
        ~T3_Command(void);

        //Executes this command and all subsequent commands in the command vector.
        void Execute(/*const Folder_in parent AND const Model_in parent*/);

        //Pushes the passed T3_Command into the Command Vector
        //@param comm - The command to be pushed.
        void add_command(const T3_Command &comm);

        //Sets the Identity of the command.
        //@param ID - the new identity to be set.
        void set_identity(std::string &ID);

    private:    

        const std::string ident;
        std::vector <T3_Command> command_vec;
        T3_Command& operator+=(const T3_Command& rhs);
};

#pragma once
#include "T3_command.h"
class T3_Int_Command :
        public T3_Command
{
public:
        T3_Int_Command();
        T3_Int_Command(const std::string &type, const int val);
        ~T3_Int_Command(void);
        void Execute();
        void setValue(int val);
private:
        int value;
};

So the problem I am having is I would like to be able to create a data structure of various commands that represent the parse tree as spirit parses through it.

Upvotes: 2

Views: 1806

Answers (1)

sehe
sehe

Reputation: 393134

Updated in response to the edited question

Though there's still a lot of information missing (see my [new comment]), at least now you showed some input and output :)

So, without further ado, let me interpret those:

  • you still want to just parse (int, string) pairs, but per line

    • use qi::blank_type as a skipper
    • do roster % eol to parse roster lines
    • my sample parses into a vector of Rosters (one per line)
    • each roster contains a variable number of Students:

      start   = roster %  eol;
      roster  = +student;
      student = int_ >> string_p;
      

      Note: Rule #1 Don't complicate your parser unless you really have to

  • you want to output the individual elements ("commands"?!?) - I'm assuming the part where this would be non-trivial is the part where the same Student might appear in several rosters?

    • By defining a total ordering on Students:

      bool operator<(Student const& other) const {
          return boost::tie(i,s) < boost::tie(other.i, other.s);
      }
      

      you make it possible to store a unique collection of students in e.g. a std::set<Student>

  • perhaps generating the 'variable names' (I mean r1, s1, s2...) is part of the task as well. So, to establish a unique 'variable name' with each student I create a bi-directional map of Students (after parsing, see Rule #1: don't complicate the parser unless it's absolutely necessary):

    boost::bimap<std::string, Student> student_vars;
    auto generate_id = [&] () { return "s" + std::to_string(student_vars.size()+1); };
    
    for(Roster const& r: data)
        for(Student const& s: r.students)
            student_vars.insert({generate_id(), s});
    

That's about everything I can think of here. I used c++11 and boost liberally here to save on lines-of-code, but writing this without c++11/boost would be fairly trivial too. C++03 version online now

The following sample input:

ParsedT3Data const data = parseData(
        "23 Bryan 45 Tyler 4 Stephen\n"
        "7 Mary 45 Tyler 8 Stephane\n"
        "23 Bryan 8 Stephane");

Results in (See it Live On Coliru):

parse success
s1 - Student - [int 23][string Bryan]
s2 - Student - [int 45][string Tyler]
s3 - Student - [int 4][string Stephen]
s4 - Student - [int 7][string Mary]
s5 - Student - [int 8][string Stephane]
r1 [s1][s2][s3]
r2 [s4][s2][s5]
r3 [s1][s5]

Full code:

#include <boost/fusion/adapted.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/tuple/tuple_comparison.hpp>
#include <boost/bimap.hpp>

namespace qi    = boost::spirit::qi;

struct Student
{
    int i;
    std::string s;

    bool operator<(Student const& other) const {
        return boost::tie(i,s) < boost::tie(other.i, other.s);
    }
    friend std::ostream& operator<<(std::ostream& os, Student const& o) {
        return os << "Student - [int " << o.i << "][string " << o.s << "]";
    }
};

struct Roster
{
    std::vector<Student> students;
};

BOOST_FUSION_ADAPT_STRUCT(Student, (int, i)(std::string, s))
BOOST_FUSION_ADAPT_STRUCT(Roster, (std::vector<Student>, students))

typedef std::vector<Roster> ParsedT3Data;

template <typename Iterator>
struct roster_grammar : qi::grammar<Iterator, ParsedT3Data(), qi::blank_type>
{   
    roster_grammar() : 
        roster_grammar::base_type(start)
    {
        using namespace qi;

        start   = roster %  eol;
        roster  = eps    >> +student; // known workaround
        student = int_   >> string_p;

        string_p = lexeme[+(graph)];

        BOOST_SPIRIT_DEBUG_NODES((start)(roster)(student)(string_p))
    }

    qi::rule <Iterator, ParsedT3Data(), qi::blank_type> start;
    qi::rule <Iterator, Roster(),       qi::blank_type> roster;
    qi::rule <Iterator, Student(),      qi::blank_type> student;
    qi::rule <Iterator, std::string()>  string_p;
};

ParsedT3Data parseData(std::string const& demoData)
{
    typedef boost::spirit::istream_iterator iter_type;
    typedef roster_grammar<iter_type> student_p;
    student_p my_parser;

    //open the target file and wrap istream into the iterator
    std::istringstream iss(demoData);
    iss.unsetf(std::ios::skipws);//Disable Whitespace Skipping

    iter_type begin(iss), end;
    ParsedT3Data result;
    bool r = phrase_parse(begin, end, my_parser, qi::blank, result);

    if (r)
        std::cout << "parse (partial) success\n";
    else      
        std::cerr << "parse failed: '" << std::string(begin,end) << "'\n";
    if (begin!=end) 
        std::cerr << "trailing unparsed: '" << std::string(begin,end) << "'\n";

    if (!r) 
        throw "TODO error handling";

    return result;
}

int main()
{
    ParsedT3Data const data = parseData(
            "23 Bryan 45 Tyler 4 Stephen\n"
            "7 Mary 45 Tyler 8 Stephane\n"
            "23 Bryan 8 Stephane");

    // now produce that list of stuff :)
    boost::bimap<std::string, Student> student_vars;
    auto generate_id = [&] () { return "s" + std::to_string(student_vars.size()+1); };

    for(Roster const& r: data)
        for(Student const& s: r.students)
            student_vars.insert({generate_id(), s});

    for(auto const& s: student_vars.left)
        std::cout << s.first << " - " << s.second << "\n";

    int r_id = 1;
    for(Roster const& r: data)
    {
        std::cout << "r" << (r_id++) << " ";
        for(Student const& s: r.students)
            std::cout << "[" << student_vars.right.at(s) << "]";
        std::cout << "\n";
    }
}

OLD ANSWER

I'll respond to individual points, while awaiting more information:

1. "The issue with this is it nests the vectors into std::vector> type data, which I couldn't work with"

A solution here would be

  • boost::vector<> which allows incomplete element types at time of instantiation (Boost Containers have several other nifty properties, go read about them!)
  • boost::variant with recursive_wrapper<> so you can indeed make logical trees. I have many answers in the and tags that show this approach (e.g. for expression trees).

2. Calling factory methods from semantic actions

I have a few minor hints:

  • you can use qi::_1, qi::_2... to refer to the elements of a compound attribute
  • you should prefer using phoenix::bind inside Phoenix actors (semantic actions are Phoenix actors)
  • you can assign to qi::_pass to indicate parser failure

Here's a simplified version of the grammar, which shows these in action. I haven't actually built a tree, since you didn't describe any of the desired behaviour. Instead, I just print a debug line on adding nodes to the tree.

See it Live on Coliru

#define BOOST_SPIRIT_USE_PHOENIX_V3
#include <boost/fusion/adapted.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <fstream>

namespace qi    = boost::spirit::qi;
namespace phx   = boost::phoenix;

struct T3_Command
{
    bool add_command(int i, std::string const& s) 
    {
        std::cout << "adding command [" << i << ", " << s << "]\n";
        return i != 42; // just to show how you can do input validation
    }
};

template <typename Iterator>
struct roster_grammar : qi::grammar<Iterator, T3_Command(), qi::space_type>
{   
    roster_grammar() : 
        roster_grammar::base_type(start_)
    {
        start_   = *(qi::int_ >> string_p) 
            [qi::_pass = phx::bind(&T3_Command::add_command, qi::_val, qi::_1, qi::_2)];

        string_p = qi::lexeme[+(qi::graph)];
    }

    qi::rule <Iterator, T3_Command(), qi::space_type> start_;
    qi::rule <Iterator, std::string()> string_p;
};

int main()
{
    typedef boost::spirit::istream_iterator iter_type;
    typedef roster_grammar<iter_type> student_p;
    student_p my_parser;

    //open the target file and wrap istream into the iterator
    std::ifstream in("input.txt");
    in.unsetf(std::ios::skipws);//Disable Whitespace Skipping
    iter_type begin(in);
    iter_type end;

    using boost::spirit::qi::space;
    using boost::spirit::qi::phrase_parse;
    bool r = phrase_parse(begin, end, my_parser, space);

    if (r)
        std::cout << "parse (partial) success\n";
    else      
        std::cerr << "parse failed: '" << std::string(begin,end) << "'\n";
    if (begin!=end) 
        std::cerr << "trailing unparsed: '" << std::string(begin,end) << "'\n";

    return r?0:255;
}

Input:

1 klaas-jan
2 doeke-jan
3 jan-herbert
4 taeke-jan
42 oops-invalid-number
5 not-parsed

Output:

adding command [1, klaas-jan]
adding command [2, doeke-jan]
adding command [3, jan-herbert]
adding command [4, taeke-jan]
adding command [42, oops-invalid-number]
parse success
trailing unparsed: '42 oops-invalid-number
5 not-parsed
'

Upvotes: 2

Related Questions