pschulz
pschulz

Reputation: 1525

Boost spirit skip parser with at least one whitespace

In the grammar i'm implementing, there are elements separated by whitespace. With a skip parser, the spaces between the elements are skipped automatically, but this also allows no space, which is not what i want. Sure, i could explicitly write a grammar that includes these spaces, but it seems to me (with the complexity and flexibility offered by spirit) that there is a better way to do this. Is there? Here is an example:

#include <cstdlib>
#include <iostream>
#include <string>

#include <boost/spirit/include/qi.hpp>    

namespace qi = boost::spirit::qi;

int main(int argc, char** argv)
{
    if(argc != 2)
    {
        std::exit(1);
    }
    std::string str = argv[1];
    auto iter = str.begin();
    bool r = qi::phrase_parse(iter, str.end(), qi::char_ >> qi::char_, qi::blank);

    if (r && iter == str.end())
    {
        std::cout << "parse succeeded\n";
    }
    else
    {
        std::cout << "parse failed. Remaining unparsed: " << std::string(iter, str.end()) << '\n';
    }
}

This allows ab as well as a b. I want only the latter to be allowed.

Related to this: How do the skip parsers work, exactly? One supplies something like qi::blank, is then the kleene star applied to form the skip parser? I would like to get some enlightenment here, maybe this also helps on solving this problem.

Additional information: My real parser looks something like this:

one   = char_("X") >> repeat(2)[omit[+blank] >> +alnum] >> qi::omit[+qi::blank] >> +alnum;
two   = char_("Y") >> repeat(3)[omit[+blank] >> +alnum];
three = char_("Z") >> repeat(4)[omit[+blank] >> +alnum] >> qi::omit[+qi::blank] >> +alnum;

main = one | two | three;

which makes the grammar quite noisy, which i would like to avoid.

Upvotes: 1

Views: 1310

Answers (1)

sehe
sehe

Reputation: 392833

First off, the grammar specs I usually see this kind of requirement in are (always?) RFCs. In 99% of cases there is no issue, consider e.g.:

 myrule = skip(space) [ uint_ >> uint_ ];

This already implicitly requires at least 1 whitespace character between the numbers, for the simple reason that there would be 1 number, otherwise. The same simplification occurs in surprisingly many cases (see e.g. the simplifications made around the ubiquitous WSP productions in this answer last week Boost.Spirit qi value sequence vector).


With that out of the way, skippers apply zero or more times, by definition, so no there is not a way to get what you want with an existing stateful directive like skip(). See also http://stackoverflow.com/questions/17072987/boost-spirit-skipper-issues/17073965#17073965 or the docs - under lexeme, [no_]skip and skip_flag::dont_postskip).


Looking at your specific grammar, I'd do this:

bool r = qi::phrase_parse(iter, end, token >> token, qi::blank);

Here, you can add a negative lookahead assertion inside a lexeme to assert that "the end of the token was reached" - which in your parser would be mandated as !qi::graph:

    auto token = qi::copy(qi::lexeme [ qi::char_ >> !qi::graph ]);

See a demo:

Live On Coliru

#include <iostream>
#include <iomanip>
#include <boost/spirit/include/qi.hpp>

namespace qi = boost::spirit::qi;

int main() {
    for (std::string const str : { "ab", " ab ", " a b ", "a b" }) {
        auto iter = str.begin(), end = str.end();

        auto token = qi::copy(qi::lexeme [ qi::char_ >> !qi::graph ]);

        bool r = qi::phrase_parse(iter, end, token >> token, qi::blank);

        std::cout << " --- " << std::quoted(str) << " --- ";
        if (r) {
            std::cout << "parse succeeded.";
        } else {
            std::cout << "parse failed.";
        }

        if (iter != end) {
            std::cout << " Remaining unparsed: " << std::string(iter, str.end());
        }

        std::cout << std::endl;
    }
}

Prints

 --- "ab" --- parse failed. Remaining unparsed: ab
 --- " ab " --- parse failed. Remaining unparsed:  ab 
 --- " a b " --- parse succeeded.
 --- "a b" --- parse succeeded.

BONUS Review notes

My guidelines would be:

  1. your skipper should be the grammar's responsibility. It's sad that all Qi samples lead people to believe you need to let the caller decide that
  2. end-iterator checking does not equal error-checking. It's very possible to parse things correctly without consuming all input. Which is why reporting the "remaining input" should not just happen in the case that parsing failed.
  3. If trailing unparsed input is an error, spell it out:

Live On Coliru

#include <iostream>
#include <iomanip>
#include <boost/spirit/include/qi.hpp>

namespace qi = boost::spirit::qi;

int main() {
    for (std::string const str : { "ab", " ab ", " a b ", "a b happy trees are trailing" }) {
        auto iter = str.begin(), end = str.end();

        auto token = qi::copy(qi::lexeme [ qi::char_ >> !qi::graph ]);

        bool r = qi::parse(iter, end, qi::skip(qi::space) [ token >> token >> qi::eoi ]);

        std::cout << " --- " << std::quoted(str) << " --- ";
        if (r) {
            std::cout << "parse succeeded.";
        } else {
            std::cout << "parse failed.";
        }

        if (iter != end) {
            std::cout << " Remaining unparsed: " << std::quoted(std::string(iter, str.end()));
        }

        std::cout << std::endl;
    }
}

Prints

 --- "ab" --- parse failed. Remaining unparsed: "ab"
 --- " ab " --- parse failed. Remaining unparsed: " ab "
 --- " a b " --- parse succeeded.
 --- "a b happy trees are trailing" --- parse failed. Remaining unparsed: "a b happy trees are trailing"

Upvotes: 4

Related Questions