XanderLo
XanderLo

Reputation: 35

Boost Spirit - Trimming spaces between last character and separator

Boost Spirit newcomer here.

I have a string in the form of "Key:Value\r\nKey2:Value2\r\n" that I'm trying to parse. In that specific form, it's trivial to parse with Boost Spirit. However, in order to be more robust, I also need to handle cases such as this one:

" My Key : Value \r\n My2ndKey : Long<4 spaces>Value \r\n"

In this case, I need to trim leading and trailing spaces before and after the key/value separators so that I get the following map:

"My Key", "Value"

"My2ndKey", "Long<4 spaces>Value"

I played with qi::hold to achieve this but I get compile errors because of unsupported boost::multi_pass iterator with the embedded parser I was trying to use. There has to be a simple way to achieve this.

I read the following articles (and many others on the subject):

http://boost-spirit.com/home/articles/qi-example/parsing-a-list-of-key-value-pairs-using-spirit-qi/ http://boost-spirit.com/home/2010/02/24/parsing-skippers-and-skipping-parsers/

Boost spirit parsing string with leading and trailing whitespace

I am looking for a solution to my problem, which doesn't seem to be entirely covered by those articles. I would also like to better understand how this is achieved. As a small bonus question, I keep seeing the '%=' operator, is this useful in my case? MyRule %= MyRule ... is used for recursive parsing?

The code below parses my strings properly except that it doesn't remove the spaces between the last non-space character and the separator. :( The skipper used is qi::blank_type (space without EOL).

Thanks!

template <typename Iterator, typename Skipper>
struct KeyValueParser : qi::grammar<Iterator, std::map<std::string, std::string>(), Skipper> {
  KeyValueParser() : KeyValueParser::base_type(ItemRule) {
    ItemRule = PairRule >> *(qi::lit(END_OF_CMD) >> PairRule);
    PairRule = KeyRule >> PAIR_SEP >> ValueRule;
    KeyRule = +(qi::char_ - qi::lit(PAIR_SEP));
    ValueRule = +(qi::char_ - qi::lit(END_OF_CMD));
  }
  qi::rule<Iterator, std::map<std::string, std::string>(), Skipper> ItemRule;
  qi::rule<Iterator, std::pair<std::string, std::string>(), Skipper> PairRule;
  qi::rule<Iterator, std::string()> KeyRule;
  qi::rule<Iterator, std::string()> ValueRule;
};

Upvotes: 1

Views: 739

Answers (1)

llonesmiz
llonesmiz

Reputation: 155

You need to use KeyRule = qi::raw[ +(qi::char_ - qi::lit(PAIR_SEP)) ];


In order to see why, let's try to study several ways to parse the string "a b :".

First let's keep in mind how the following parsers/directives work:

  • lexeme[subject]: This directive matches subject while disabling the skipper.

  • raw[subject]: Discards subject's attribute and returns an iterator pair that points to the matched characters in the input stream.

  • +subject: The plus parser tries to match 1 or more times its subject.

  • a-b: The difference parser first tries to parse b and if b succeeds, a-b fails. When b fails, it matches a.

  • char_: matches any char. It's a PrimitiveParser.

  • lit(':'): matches ':' but ignores its attribute. It's a PrimitiveParser.


  1. lexeme[ +(char_ - lit(':')) ]: by removing the skipper from your rule you have an implicit lexeme. Since there is no skipper it goes like this:

'a' -> ':' fails, char_ matches 'a', the current synthesized attribute is "a"
' ' -> ':' fails, char_ matches ' ', the current synthesized attribute is "a "
'b' -> ':' fails, char_ matches 'b', the current synthesized attribute is "a b"
' ' -> ':' fails, char_ matches ' ', the current synthesized attribute is "a b "
':' -> ':' succeeds, the final synthesized attribute is "a b "


  1. +(char_ - lit(':')): Since it has a skipper every PrimitiveParser will pre-skip before being tried:

'a' -> ':' fails, char_ matches 'a', the current synthesized attribute is "a"
' ' -> this is skipped before ':' is tried
'b' -> ':' fails, char_ matches 'b', the current synthesized attribute is "ab"
' ' -> this is skipped before ':' is tried
':' -> ':' succeeds, the final synthesized attribute is "ab"


  1. raw[ +(char_ - lit(':') ) ]: The subject is exactly the same as 2.. The raw directive ignores "ab" and returns an iterator pair that goes from 'a' to 'b'. Since the attribute of your rule is std::string, a string is constructed from that iterator pair, resulting in "a b" which is what you want.

Upvotes: 0

Related Questions