Juraj Blaho
Juraj Blaho

Reputation: 13451

Boost.Spirit is not parsing whole input

I have a boost::spirit::qi rules:

auto dquote = qi::char_('\"');
auto comma = qi::char_(',');
auto newline = qi::char_('\n');
auto nonEscaped = *(qi::char_ - newline - comma - dquote);
auto escaped = *qi::blank >> dquote >> *((qi::char_ - dquote) | (dquote >> dquote)) >> dquote >> *qi::blank;
auto field = nonEscaped | escaped;

When I try to parse an input:

string input(" \"e\"\"e\" ");
qi::phrase_parse(begin(input), end(input), field, qi::char_('\r'));

The input is not fully matched by the escaped rule, but only the nonEscaped rule is applied. So only the first space is matched. How do I convince spirit to parse whole input or to parse as much as possible?

When I change the order of variants in the field rule to the following, then it works. But is that the right solution?

auto field = escaped | nonEscaped;

Upvotes: 2

Views: 270

Answers (1)

sehe
sehe

Reputation: 392833

Yes, reordering is the right solution.

Boost Spirit generates what's known as LL parsers, which means

It parses the input from Left to right, and constructs a Leftmost derivation of the sentence (hence LL, compared with LR parser)

In simple words, it matches the first possible token and doesn't do backtracking unless the rule fails. You could 'assert' a post-condition of sorts at the end of the nonEscaped rule, see

Using Semantic Actions:

  • assigning to _pass in semantic actions
  • use a semantic action function object, returning bool (false to fail)

However, in practice this will lead to suboptimal parsers (unnecessary backtracking, e.g.)

HTH

Upvotes: 3

Related Questions