Reputation: 195
Using boost::spirit::qi I'm trying to parse lines consisting of a label followed by a variable number of delimited tokens. I'm calling the grammar with phrase_parse and using the provided blank parser as skip parser to preserve newlines as I need to make sure the label is the first item on each line.
The simple base case:
label token, token, token
Can be parsed with the grammar:
line = label >> (token % ',') >> eol;
The problem I am facing is that the grammar should accept zero or more tokens and that tokens may be empty. The grammar should accept the following lines:
label
label ,
label , token
label token, , token,
I have not managed to create a grammar that accepts all examples above. Any suggestions on how to solve this?
Edit:
Thanks to sehe for all input on the problem stated above. Now for the fun part that I forgot to include... The grammar should also accept empty lines and split lines. (tokens without a label) When I try to make the label optional, I get an infinite loop matching the empty string.
label
label token
token
Upvotes: 3
Views: 1378
Reputation: 392989
You should be able to accept the empty list with
line = label >> -(token % ',') >> eol;
Note that eol
won't work if your skipper skips eol too (so don't use qi::space
but e.g. qi::blank
for this purpose)
Also, depending on the definition of token
you should maybe change it to accept the "empty" token as well
In response to the comment: a fully working sample Live On Coliru
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
int main()
{
using namespace qi;
using It = std::string::const_iterator;
using Token = std::string;
using Tokens = std::vector<Token>;
rule<It, blank_type> label
= lexeme[+~char_(":")] >> ':'
;
rule<It, Token(), blank_type> token
= lexeme[*~char_(",\n")];
;
rule<It, Tokens(), blank_type> line
= label >> -(token % ',') >> eol
;
for (std::string const input : {
"my first label: 123, 234, 345 with spaces\n",
"1:\n",
"2: \n",
"3: ,,,\n",
"4: , \t ,,\n",
"5: , \t , something something,\n",
})
{
std::cout << std::string(40, '=') << "\nparsing: '" << input << "'\n";
Tokens parsed;
auto f = input.begin(), l = input.end();
bool ok = phrase_parse(f, l, line, blank, parsed);
if (ok)
{
std::cout << "Tokens parsed successfully, number parsed: " << parsed.size() << "\n";
for (auto token : parsed)
std::cout << "token value '" << token << "'\n";
}
else
std::cout << "Parse failed\n";
if (f != l)
std::cout << "Remaining input: '" << std::string(f, l) << "'\n";
}
}
Output:
========================================
parsing: 'my first label: 123, 234, 345 with spaces
'
Tokens parsed successfully, number parsed: 3
token value '123'
token value '234'
token value '345 with spaces'
========================================
parsing: '1:
'
Tokens parsed successfully, number parsed: 1
token value ''
========================================
parsing: '2:
'
Tokens parsed successfully, number parsed: 1
token value ''
========================================
parsing: '3: ,,,
'
Tokens parsed successfully, number parsed: 4
token value ''
token value ''
token value ''
token value ''
========================================
parsing: '4: , ,,
'
Tokens parsed successfully, number parsed: 4
token value ''
token value ''
token value ''
token value ''
========================================
parsing: '5: , , something something,
'
Tokens parsed successfully, number parsed: 4
token value ''
token value ''
token value 'something something'
token value ''
Upvotes: 3