Maik
Maik

Reputation: 559

boost::spirit::qi::parse grammar not working as expected - part 2

A few days ago I asked this question

An open point was that it was not clear how to handle values (-23.0 in my example). The string shall be parsed as a value (expressed as a string type) and not as an option.

I now tried to extend the proposed grammar but again without success. I also tried to relaxe my requirements so I think it is valid to define an argument with double dashes "--". The idea was to get an unique identifier for the argument. This is my current grammar but the parsing fails and I have no clue why:

//#define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted.hpp>
#include <boost/spirit/include/qi.hpp>
#include <map>
#include <string>
#include <vector>

// Structure stores the parsed command line information:
struct CmdData
{
    typedef std::string               Name;

    typedef std::string               ArgName;
    typedef std::string               Value;

    typedef std::vector<Value>        Values;  // Type defines a list of values:
    typedef std::map<ArgName, Values> Args;    // Type defines a map storing the relation between a argument and the corresponding values:

    Name cmd; // Stores the command name as a string.
    Args arg; // Stores the arguments and the corresponding values as strings.
};

BOOST_FUSION_ADAPT_STRUCT(CmdData, (CmdData::Name, cmd)(CmdData::Args, arg))

namespace Grammar
{
    namespace qi = boost::spirit::qi;

    // This class implements the grammar used to parse a command line.
    // The expected format is as follows:
    // - command
    // - command value0 ... valueN
    // - command -arg0 ... -argN
    // - command -arg0 value0 ... valueN ... -argN value0 ... valueN
    template <typename It>
    struct decode : qi::grammar<It, CmdData()>
    {
    decode() : decode::base_type(data)
    {
        using namespace qi;

        token  = +( ~char_( "\r\n -" ) );
        values = +( ~char_( "--" ) >> +token );

        //
        entry  = (lexeme[ "--" >> token ] >> -values | attr( "empty" ) >> values );
        args   = *entry;

        //
        data   = skip(qi::blank) [ token >> args ];

        BOOST_SPIRIT_DEBUG_NODES( (token)(values)(entry)(args)(data) )
    }

private:
    qi::rule<It, CmdData()> data;

    // The following variables define the rules used within this grammar:
    typedef std::pair<CmdData::ArgName, CmdData::Values> Entry;
    qi::rule<It, CmdData::Values(), qi::blank_type> values;
    qi::rule<It, Entry(),           qi::blank_type> entry;
    qi::rule<It, CmdData::Args(),   qi::blank_type> args;

    // lexemes
    qi::rule<It, std::string()> token;
    };

}   // namespace

bool parse(const std::string& in)
{
    CmdData data;

    // Create an instance of the used grammar:
    Grammar::decode<std::string::const_iterator> gr;

    // Try to parse the data stored within the stream according the grammar and store the result in the tag variable:
    bool b = boost::spirit::qi::parse(in.begin(), in.end(), gr, data);

    std::cout << "Parsing: '" << in << "' ok: " << std::boolalpha << b << "\n";
    if (b)
        std::cout << "Entries parsed: " << data.arg.size() << "\n";

    return b;
}

int main()
{
    parse("   cmd0");
    parse("   cmd0        value0  value1  value2 -23.0");
    parse("   cmd0  -23.0 value0  value1  value2");
    parse("   cmd0  --arg0  --arg1  123 --arg2 -23.0");
    parse("   cmd0  --arg0  value0  --arg1  value0  value1  --arg2  value0  value1  value2");
}

Upvotes: 1

Views: 98

Answers (1)

Chris Beck
Chris Beck

Reputation: 16214

Ok, I played with your grammar, and I think I got it to work.

Let me make the disclaimer that I am not an expert in boost spirit and I have only a medium level of experience.

Here were the things I changed:

  1. I don't know what the ~ operator is in spirit, it's not documented here: http://www.boost.org/doc/libs/1_44_0/libs/spirit/doc/html/spirit/qi/reference/operator.html In my version I removed it.

  2. I think you were using ~ to try to mean "not these characters". The way I do that is usually using - operator. That is I make a "general" expression and then exclude things from it using -.

  3. I got rid of all your skip grammars and just added a whitespace rule. As long as the whitespace rule has no attribute, it won't affect the automatic attribute deduction, it will have qi::unused_type. That probably wasn't necessary / optimal but it was faster for me to make a working answer that way.

  4. I think the two major problems that I fixed in your grammar were, using ~char_( "--" ) when you should have used something like - "--" or - lit("--") as pointed out by cv_and_he in comments, and the part where you parse the argument classes "--" >> token and weren't using lit, which surely confused the automatic attribute collection system.

Here's what I ended up with:

#define BOOST_SPIRIT_USE_PHOENIX_V3

#include <boost/config/warning_disable.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_object.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <boost/spirit/include/phoenix_fusion.hpp>
#include <boost/spirit/include/phoenix_stl.hpp>
#include <boost/fusion/adapted/struct/adapt_struct.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/fusion/include/std_pair.hpp>

#include <string>
#include <vector>

// Structure stores the parsed command line information:
struct CmdData
{
    typedef std::string               Name;

    typedef std::string               ArgName;
    typedef std::string               Value;

    typedef std::vector<Value>        Values;  // Type defines a list of values:
    typedef std::map<ArgName, Values> Args;    // Type defines a map storing the relation between a argument and the corresponding values:

    Name cmd; // Stores the command name as a string.
    Args arg; // Stores the arguments and the corresponding values as strings.
};

BOOST_FUSION_ADAPT_STRUCT(CmdData, (CmdData::Name, cmd)(CmdData::Args, arg))

namespace Grammar
{
    namespace qi = boost::spirit::qi;

    // This class implements the grammar used to parse a command line.
    // The expected format is as follows:
    // - command
    // - command value0 ... valueN
    // - command -arg0 ... -argN
    // - command -arg0 value0 ... valueN ... -argN value0 ... valueN
    template <typename It>
    struct decode : qi::grammar<It, CmdData()>
    {
    decode() : decode::base_type(data)
    {
        using namespace qi;

        ws = char_("\r\n ");
        token  = +( char_ - ws - lit("--") );
        values = token % (+ws);

        //
        arg_label = lit("--") >> token;
        entry  = arg_label >> -(+ws >> values);
        args   = entry % (+ws);

        //
        data   = *ws >> token >> -(+ws >> args) >> *ws;

        BOOST_SPIRIT_DEBUG_NODES( (token)(values)(entry)(args)(data) )
    }

private:
    qi::rule<It, CmdData()> data;

    // The following variables define the rules used within this grammar:
    typedef std::pair<CmdData::ArgName, CmdData::Values> Entry;
    qi::rule<It, CmdData::Values()> values;
    qi::rule<It, Entry()> entry;
    qi::rule<It, CmdData::Args()> args;

    // lexemes
    qi::rule<It, std::string()> token;
    qi::rule<It, std::string()> arg_label;
    qi::rule<It> ws;
    };

}   // namespace

bool parse(const std::string& in)
{
    CmdData data;

    // Create an instance of the used grammar:
    Grammar::decode<std::string::const_iterator> gr;

    // Try to parse the data stored within the stream according the grammar and store the result in the tag variable:
    bool b = boost::spirit::qi::parse(in.begin(), in.end(), gr, data);

    std::cout << "Parsing: '" << in << "' ok: " << std::boolalpha << b << "\n";
    if (b) {
        std::cout << "Entries parsed: " << data.arg.size() << "\n";

        for (const auto & p : data.arg) {
            std::cout << "  " << p.first;
            bool first = true;
            for (const auto & v : p.second) {
                if (first) {
                    std::cout << " : ";
                    first = false;
                } else {
                    std::cout << " , ";
                }
                std::cout << v;
            }
        std::cout << std::endl;
        }
    }

    return b;
}

int main()
{
    parse("   cmd0");
    parse("   cmd0        value0  value1  value2 -23.0");
    parse("   cmd0  -23.0 value0  value1  value2");
    parse("   cmd0  --arg0  --arg1  123 --arg2 -23.0");
    parse("   cmd0  --arg0  value0  --arg1  value0  value1  --arg2  value0  value1  value2");
}

Compiled with gcc version 4.8.4. Here's my output:

$ g++ -std=c++11 main.cpp -o main
$ ./main 
Parsing: '   cmd0' ok: true
Entries parsed: 0
Parsing: '   cmd0        value0  value1  value2 -23.0' ok: true
Entries parsed: 0
Parsing: '   cmd0  -23.0 value0  value1  value2' ok: true
Entries parsed: 0
Parsing: '   cmd0  --arg0  --arg1  123 --arg2 -23.0' ok: true
Entries parsed: 3
  arg0
  arg1 : 123
  arg2 : -23.0
Parsing: '   cmd0  --arg0  value0  --arg1  value0  value1  --arg2  value0  value1  value2' ok: true
Entries parsed: 3
  arg0 : value0
  arg1 : value0 , value1
  arg2 : value0 , value1 , value2

Edit:

As pointed out in comments my first answer wasn't correct because it doesn't handle the "empty" argument type. I see now that the answer from part 1 was doing that part correctly. In this version I fixed that and I also fixed up the whitespace so that it's handled more cleanly / more like the original code sample.

#define BOOST_SPIRIT_USE_PHOENIX_V3

#include <boost/config/warning_disable.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_object.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <boost/spirit/include/phoenix_fusion.hpp>
#include <boost/spirit/include/phoenix_stl.hpp>
#include <boost/fusion/adapted/struct/adapt_struct.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/fusion/include/std_pair.hpp>

#include <string>
#include <vector>

// Structure stores the parsed command line information:
struct CmdData
{
    typedef std::string               Name;

    typedef std::string               ArgName;
    typedef std::string               Value;

    typedef std::vector<Value>        Values;  // Type defines a list of values:
    typedef std::map<ArgName, Values> Args;    // Type defines a map storing the relation between a argument and the corresponding values:

    Name cmd; // Stores the command name as a string.
    Args arg; // Stores the arguments and the corresponding values as strings.
};

BOOST_FUSION_ADAPT_STRUCT(CmdData, (CmdData::Name, cmd)(CmdData::Args, arg))

namespace Grammar
{
    namespace qi = boost::spirit::qi;

    // This class implements the grammar used to parse a command line.
    // The expected format is as follows:
    // - command
    // - command value0 ... valueN
    // - command -arg0 ... -argN
    // - command -arg0 value0 ... valueN ... -argN value0 ... valueN
    template <typename It>
    struct decode : qi::grammar<It, CmdData()>
    {
    decode() : decode::base_type(data)
    {
        using namespace qi;

        token  = +( char_ - blank - lit("--") );

        //
        arg_label = lit("--") >> token;
        entry  = skip(blank) [
                     (arg_label >> *token) | ( attr("empty") >> +token)
                 ];
        args   = *entry;

        //
        data   = skip(blank) [ token >> args ];

        BOOST_SPIRIT_DEBUG_NODES( (token)(entry)(args)(arg_label)(data) )
    }

private:
    qi::rule<It, CmdData()> data;

    // The following variables define the rules used within this grammar:
    typedef std::pair<CmdData::ArgName, CmdData::Values> Entry;
    qi::rule<It, Entry()> entry;
    qi::rule<It, CmdData::Args()> args;

    // lexemes
    qi::rule<It, std::string()> token;
    qi::rule<It, std::string()> arg_label;
    };

}   // namespace

bool parse(const std::string& in)
{
    CmdData data;

    // Create an instance of the used grammar:
    Grammar::decode<std::string::const_iterator> gr;

    // Try to parse the data stored within the stream according the grammar and store the result in the tag variable:
    bool b = boost::spirit::qi::parse(in.begin(), in.end(), gr, data);

    std::cout << "Parsing: '" << in << "' ok: " << std::boolalpha << b << "\n";
    if (b) {
        std::cout << "Entries parsed: " << data.arg.size() << "\n";

        for (const auto & p : data.arg) {
            std::cout << "  " << p.first;
            bool first = true;
            for (const auto & v : p.second) {
                if (first) {
                    std::cout << " : ";
                    first = false;
                } else {
                    std::cout << " , ";
                }
                std::cout << v;
            }
        std::cout << std::endl;
        }
    }

    return b;
}

int main()
{
    parse("   cmd0");
    parse("   cmd0        value0  value1  value2 -23.0");
    parse("   cmd0  -23.0 value0  value1  value2");
    parse("   cmd0  --arg0  --arg1  123 --arg2 -23.0");
    parse("   cmd0  --arg0  value0  --arg1  value0  value1  --arg2  value0  value1  value2");
}

My output is now like this:

$ ./main 
Parsing: '   cmd0' ok: true
Entries parsed: 0
Parsing: '   cmd0        value0  value1  value2 -23.0' ok: true
Entries parsed: 1
  empty : value0 , value1 , value2 , -23.0
Parsing: '   cmd0  -23.0 value0  value1  value2' ok: true
Entries parsed: 1
  empty : -23.0 , value0 , value1 , value2
Parsing: '   cmd0  --arg0  --arg1  123 --arg2 -23.0' ok: true
Entries parsed: 3
  arg0
  arg1 : 123
  arg2 : -23.0
Parsing: '   cmd0  --arg0  value0  --arg1  value0  value1  --arg2  value0  value1  value2' ok: true
Entries parsed: 3
  arg0 : value0
  arg1 : value0 , value1
  arg2 : value0 , value1 , value2

I had to change things around a little bit in that version, because I was getting an infinite loop with *entry and then attr("empty") >> *tokens. I think this is most likely the simplest way to get it to work while still using all automatic attributions, not sure.

Upvotes: 2

Related Questions