Reputation: 1525
In the grammar i'm implementing, there are elements separated by whitespace. With a skip parser, the spaces between the elements are skipped automatically, but this also allows no space, which is not what i want. Sure, i could explicitly write a grammar that includes these spaces, but it seems to me (with the complexity and flexibility offered by spirit) that there is a better way to do this. Is there? Here is an example:
#include <cstdlib>
#include <iostream>
#include <string>
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
int main(int argc, char** argv)
{
if(argc != 2)
{
std::exit(1);
}
std::string str = argv[1];
auto iter = str.begin();
bool r = qi::phrase_parse(iter, str.end(), qi::char_ >> qi::char_, qi::blank);
if (r && iter == str.end())
{
std::cout << "parse succeeded\n";
}
else
{
std::cout << "parse failed. Remaining unparsed: " << std::string(iter, str.end()) << '\n';
}
}
This allows ab
as well as a b
. I want only the latter to be allowed.
Related to this: How do the skip parsers work, exactly? One supplies something like qi::blank, is then the kleene star applied to form the skip parser? I would like to get some enlightenment here, maybe this also helps on solving this problem.
Additional information: My real parser looks something like this:
one = char_("X") >> repeat(2)[omit[+blank] >> +alnum] >> qi::omit[+qi::blank] >> +alnum;
two = char_("Y") >> repeat(3)[omit[+blank] >> +alnum];
three = char_("Z") >> repeat(4)[omit[+blank] >> +alnum] >> qi::omit[+qi::blank] >> +alnum;
main = one | two | three;
which makes the grammar quite noisy, which i would like to avoid.
Upvotes: 1
Views: 1310
Reputation: 392833
First off, the grammar specs I usually see this kind of requirement in are (always?) RFCs. In 99% of cases there is no issue, consider e.g.:
myrule = skip(space) [ uint_ >> uint_ ];
This already implicitly requires at least 1 whitespace character between the numbers, for the simple reason that there would be 1 number, otherwise. The same simplification occurs in surprisingly many cases (see e.g. the simplifications made around the ubiquitous WSP productions in this answer last week Boost.Spirit qi value sequence vector).
With that out of the way, skippers apply zero or more times, by definition, so no there is not a way to get what you want with an existing stateful directive like skip()
. See also http://stackoverflow.com/questions/17072987/boost-spirit-skipper-issues/17073965#17073965 or the docs - under lexeme
, [no_]skip
and skip_flag::dont_postskip
).
Looking at your specific grammar, I'd do this:
bool r = qi::phrase_parse(iter, end, token >> token, qi::blank);
Here, you can add a negative lookahead assertion inside a lexeme to assert that "the end of the token was reached" - which in your parser would be mandated as !qi::graph
:
auto token = qi::copy(qi::lexeme [ qi::char_ >> !qi::graph ]);
See a demo:
#include <iostream>
#include <iomanip>
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
int main() {
for (std::string const str : { "ab", " ab ", " a b ", "a b" }) {
auto iter = str.begin(), end = str.end();
auto token = qi::copy(qi::lexeme [ qi::char_ >> !qi::graph ]);
bool r = qi::phrase_parse(iter, end, token >> token, qi::blank);
std::cout << " --- " << std::quoted(str) << " --- ";
if (r) {
std::cout << "parse succeeded.";
} else {
std::cout << "parse failed.";
}
if (iter != end) {
std::cout << " Remaining unparsed: " << std::string(iter, str.end());
}
std::cout << std::endl;
}
}
Prints
--- "ab" --- parse failed. Remaining unparsed: ab
--- " ab " --- parse failed. Remaining unparsed: ab
--- " a b " --- parse succeeded.
--- "a b" --- parse succeeded.
My guidelines would be:
#include <iostream>
#include <iomanip>
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
int main() {
for (std::string const str : { "ab", " ab ", " a b ", "a b happy trees are trailing" }) {
auto iter = str.begin(), end = str.end();
auto token = qi::copy(qi::lexeme [ qi::char_ >> !qi::graph ]);
bool r = qi::parse(iter, end, qi::skip(qi::space) [ token >> token >> qi::eoi ]);
std::cout << " --- " << std::quoted(str) << " --- ";
if (r) {
std::cout << "parse succeeded.";
} else {
std::cout << "parse failed.";
}
if (iter != end) {
std::cout << " Remaining unparsed: " << std::quoted(std::string(iter, str.end()));
}
std::cout << std::endl;
}
}
Prints
--- "ab" --- parse failed. Remaining unparsed: "ab"
--- " ab " --- parse failed. Remaining unparsed: " ab "
--- " a b " --- parse succeeded.
--- "a b happy trees are trailing" --- parse failed. Remaining unparsed: "a b happy trees are trailing"
Upvotes: 4