Reputation: 35
Boost Spirit newcomer here.
I have a string in the form of "Key:Value\r\nKey2:Value2\r\n" that I'm trying to parse. In that specific form, it's trivial to parse with Boost Spirit. However, in order to be more robust, I also need to handle cases such as this one:
" My Key : Value \r\n My2ndKey : Long<4 spaces>Value \r\n"
In this case, I need to trim leading and trailing spaces before and after the key/value separators so that I get the following map:
"My Key", "Value"
"My2ndKey", "Long<4 spaces>Value"
I played with qi::hold to achieve this but I get compile errors because of unsupported boost::multi_pass iterator with the embedded parser I was trying to use. There has to be a simple way to achieve this.
I read the following articles (and many others on the subject):
http://boost-spirit.com/home/articles/qi-example/parsing-a-list-of-key-value-pairs-using-spirit-qi/ http://boost-spirit.com/home/2010/02/24/parsing-skippers-and-skipping-parsers/
Boost spirit parsing string with leading and trailing whitespace
I am looking for a solution to my problem, which doesn't seem to be entirely covered by those articles. I would also like to better understand how this is achieved. As a small bonus question, I keep seeing the '%=' operator, is this useful in my case? MyRule %= MyRule ... is used for recursive parsing?
The code below parses my strings properly except that it doesn't remove the spaces between the last non-space character and the separator. :( The skipper used is qi::blank_type (space without EOL).
Thanks!
template <typename Iterator, typename Skipper>
struct KeyValueParser : qi::grammar<Iterator, std::map<std::string, std::string>(), Skipper> {
KeyValueParser() : KeyValueParser::base_type(ItemRule) {
ItemRule = PairRule >> *(qi::lit(END_OF_CMD) >> PairRule);
PairRule = KeyRule >> PAIR_SEP >> ValueRule;
KeyRule = +(qi::char_ - qi::lit(PAIR_SEP));
ValueRule = +(qi::char_ - qi::lit(END_OF_CMD));
}
qi::rule<Iterator, std::map<std::string, std::string>(), Skipper> ItemRule;
qi::rule<Iterator, std::pair<std::string, std::string>(), Skipper> PairRule;
qi::rule<Iterator, std::string()> KeyRule;
qi::rule<Iterator, std::string()> ValueRule;
};
Upvotes: 1
Views: 739
Reputation: 155
You need to use KeyRule = qi::raw[ +(qi::char_ - qi::lit(PAIR_SEP)) ];
In order to see why, let's try to study several ways to parse the string "a b :"
.
First let's keep in mind how the following parsers/directives work:
lexeme[subject]
: This directive matches subject
while disabling the skipper.
raw[subject]
: Discards subject
's attribute and returns an iterator pair that points to the matched characters in the input stream.
+subject
: The plus parser tries to match 1 or more times its subject
.
a-b
: The difference parser first tries to parse b
and if b
succeeds, a-b
fails. When b
fails, it matches a
.
char_
: matches any char. It's a PrimitiveParser
.
lit(':')
: matches ':'
but ignores its attribute. It's a PrimitiveParser
.
lexeme[ +(char_ - lit(':')) ]
: by removing the skipper from your rule you have an implicit lexeme. Since there is no skipper it goes like this:'a' ->
':'
fails,char_
matches 'a', the current synthesized attribute is "a"
' ' ->':'
fails,char_
matches ' ', the current synthesized attribute is "a "
'b' ->':'
fails,char_
matches 'b', the current synthesized attribute is "a b"
' ' ->':'
fails,char_
matches ' ', the current synthesized attribute is "a b "
':' ->':'
succeeds, the final synthesized attribute is "a b "
+(char_ - lit(':'))
: Since it has a skipper every PrimitiveParser will pre-skip before being tried:'a' ->
':'
fails,char_
matches 'a', the current synthesized attribute is "a"
' ' -> this is skipped before':'
is tried
'b' ->':'
fails,char_
matches 'b', the current synthesized attribute is "ab"
' ' -> this is skipped before':'
is tried
':' ->':'
succeeds, the final synthesized attribute is "ab"
raw[ +(char_ - lit(':') ) ]
: The subject is exactly the same as 2.
. The raw directive ignores "ab"
and returns an iterator pair that goes from 'a'
to 'b'
. Since the attribute of your rule is std::string
, a string is constructed from that iterator pair, resulting in "a b"
which is what you want.Upvotes: 0