Max
Max

Reputation: 3180

Boost Spirit failing on empty string input

I am trying to parse the following string and extract the parts inside the parenthesis.

This string fails:

_FIND('Something', '')_
Should return
part1 = 'Something'
part2 = ''

This string passes:

_FIND('Something', '*')_
Returns
part1 = 'Something'
part2 = '*'

I assume the problem lies with the "quoted_string"

    find_parser() : find_parser::base_type(start)
    {
        using qi::lit;
        using qi::lexeme;
        using standard_wide::char_;

        /// simple quoted string.
        quoted_string %= lexeme['\'' >> +(char_ - '\'') >> '\''];

        start %=
            -(lit("$(")) // optional
            >> lit("_FIND")
            >> '('
            >> quoted_string
            >> -(',' >> quoted_string) // 2nd parameter optional
            >> ")_"
            >> -(lit(")")) // optional
            ;
    }

I tried added an "empty" string lexeme like this, but it does not work.

        quoted_string %= lexeme['\'' >> +(char_ - '\'') >> '\''];
        empty_quoted_string %= lexeme['\'' >> +(qi::space - '\'') >> '\''];

        start %=
            lit("_FIND")
            >> '('
            >> (quoted_string|empty_quoted_string)
            >> -(',' >> (quoted_string|empty_quoted_string)) // 2nd parameter optional
            >> ")_"
            ;

I know it must be a simple thing, but I cannot put my finger on it.

Thanks for any inputs, hints or tips.

Upvotes: 1

Views: 116

Answers (1)

sehe
sehe

Reputation: 393114

  lexeme['\'' >> +(char_ - '\'') >> '\''];

+p means that p must match one-or-more times. If an empty string must be accepted, use the Kleene-star operator, which allows zero-or-more matches.

  lexeme['\'' >> *(char_ - '\'') >> '\''];

Live Demo

Some inefficiencies/style issues resolves

Also, an incorrectness, where "$(_FIND('')" or "_FIND('')" would parse as "correct"

Live On Coliru

#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/adapted/std_pair.hpp>

using Params = std::pair<std::string, std::string>;

namespace qi = boost::spirit::qi;

template <typename It> 
struct find_parser : qi::grammar<It, Params()> {
    find_parser() : find_parser::base_type(start)
    {
        using namespace qi;

        start = skip(space) [ "$(" >> find >> ")" | find ];

        find
            = '_' >> lit("FIND") >> lit('(')
            >> quoted_string >> -(',' >> quoted_string) // 2nd parameter optional
            >> ')' >> '_'
            ;

        quoted_string = "'" >> *~char_("'") >> "'";

        BOOST_SPIRIT_DEBUG_NODES((start)(find)(quoted_string))
    }

  private:
    qi::rule<It, Params()> start;

    // rules with skipper
    qi::rule<It, Params(), qi::space_type> find;

    // implicit lexemes
    qi::rule<It, std::string()> quoted_string;
};

int main() {
    using It = std::string::const_iterator;
    find_parser<It> const p;

    for (std::string const input : {
            "_FIND('Something', 'Something else')_",
            "_ FIND('Something', 'Something else') _",
            "$(_FIND('Something', 'Something else')_)",
            "$( _FIND( 'Something', 'Something else' )_ )",
            // second arg empty
            "_FIND('Something', '')_",
            "_ FIND('Something', '') _",
            "$(_FIND('Something', '')_)",
            "$( _FIND( 'Something', '' )_ )",
            // optional args omitted
            "_FIND('Something')_",
            "_ FIND('Something') _",
            "$(_FIND('Something')_)",
            "$( _FIND( 'Something' )_ )",
            })
    {
        std::cout << "-------- " << input << " ------------\n";

        It f = input.begin(), l = input.end();
        Params parsed;
        if (parse(f, l, p, parsed))
            std::cout << "Parsed: '" << parsed.first << "', '" << parsed.second << "'\n";
        else
            std::cout << "Parsing failed\n";

        if (f!=l)
            std::cout << "  -- Remaining unparsed: '" << std::string(f,l) << "'\n";
    }
}

Upvotes: 1

Related Questions