Reputation: 153

Boost spirit parsing indended list of items

I need to parse something like (yaml):

- from: src
  to: 
    - target1
    - target2
- from: src2
  to: 
    - target3
    - target4

I tried something like (simplified pseudo)

  identifierRule = +alnum;
  fromToRule = lit("-") >> 
    ( 
      "from:" >> identifierRule >> qi::eol >>
      (
        ("to: " >> qi::eol >> +(qi::repeat(indention)[qi::blank] >> "-" >> identifierRule >> qi::eol))
    );

But with this approach the second 'from' entry is parsed as an additional entry of the first 'to' entries and not as a new seperate entry. Is there any way to retrieve the current indention level and use this as an additional rule information?

Upvotes: 2

Answers (1)

sehe

Reputation: 392999

Of course you should be using a YAML library (e.g. yaml-cpp), because YAML is much more versatile and ... riddled with parser idiosyncrasies. Don't roll your own.

However, assuming you're trying to learn Spirit Qi, there's merit to the question.

It's not at all trivial though, and a lot of it depends on what you want to be parsing into. Focusing only on the input shown, I'd imagine an AST like:

using Key   = std::string;
using Raw   = std::string;
using Value = boost::make_recursive_variant<  //
    Raw,                                      //
    std::map<Key, boost::recursive_variant_>, //
    std::vector<boost::recursive_variant_>>::type;

using List = std::vector<Value>;
using Dict = std::map<Key, Value>;

So, "- a\n- b" would be a list, "a: b\nc: d" would be a dict and anything else is a raw value.

To be able to nest, let's create rules that are parameterized by a level number:

using Entry = std::pair<Key, Value>;
qi::rule<It, Value()>       start;
qi::rule<It, Value(int)>    value;
qi::rule<It, List(int)>     list;
qi::rule<It, Dict(int)>     dict;
qi::rule<It, Entry(int)>    entry;

qi::rule<It, Key()>     key;
qi::rule<It, Raw()>     rawvalue;
qi::rule<It, void(int)> linebreak_;

Only key and rawvalue never contain a newline, so don't need the parameter. linebreak_ doesn't expose attributes, but is made a rule so we could enable debug output for it.

Now, leaning on a lot of experience I might write the rules as follows:

using namespace qi;
_r1_type level; // friendly name for inherited attribute
auto nested    = level + 1;

First things first, so we can keep it "readable". Right away, some of the helpers:

linebreak_     = *blank >> eol >> repeat(level)["  "];
auto linebreak = linebreak_(level);
auto identchar = copy(char_("a-zA-Z0-9_"));

We help ourselves with shorthands, so we don't have to repeat ourselves. However note the subtle presence of qi::copy (which is proto::deep_copy, see e.g. Assigning parsers to auto variables).

Now, we can have the rules pretty much "naively":

key      = (identchar - digit) >> *identchar;
rawvalue = omit[*blank >> &graph] >> *(char_ - eol);

The vagueness going on here is the unspecified omission of blank space at the beginning of raw values. Now, let's continue top-down for level-aware productions:

start    = value(0);
value    = *linebreak >> (list(level) | dict(level) | rawvalue);

We start with list, because it's most recognizable by it's "- " prefix:

list     = ("- " >> value(nested)) % linebreak;

Remember nested is just the Phoenix expression for level + 1.

dict     = entry(level) % linebreak;

Dicts keep the same level for all entries.

entry    = key >> skip(blank)[":"] >> value(nested);

Note we tolerate insignificant blank space around :.

Everything rolled together:

template <typename It> struct Parser : qi::grammar<It, Value()> {
    Parser() : Parser::base_type(start) {
        using namespace qi;
        _r1_type level; // friendly name for inherited attribute

        auto nested    = level + 1;
        linebreak_     = *blank >> eol >> repeat(level)["  "];
        auto linebreak = linebreak_(level);
        auto identchar = copy(char_("a-zA-Z0-9_"));

        key      = (identchar - digit) >> *identchar;
        rawvalue = omit[*blank >> &graph] >> *(char_ - eol);
        entry    = key >> skip(blank)[":"] >> value(nested);
        dict     = entry(level) % linebreak;
        list     = ("- " >> value(nested)) % linebreak;
        value    = *linebreak >> (list(level) | dict(level) | rawvalue);
        start    = value(0);

        BOOST_SPIRIT_DEBUG_NODES(
            (start)(value)(list)(dict)(entry)(rawvalue)(key)/*(linebreak_)*/)
    }

  private:
    using Entry = std::pair<Key, Value>;
    qi::rule<It, Value()>    start;
    qi::rule<It, Value(int)> value;
    qi::rule<It, List(int)>  list;
    qi::rule<It, Dict(int)>  dict;
    qi::rule<It, Entry(int)> entry;

    qi::rule<It, Key()>     key;
    qi::rule<It, Raw()>     rawvalue;
    qi::rule<It, void(int)> linebreak_;
};

Adding minimal code to print the resulting AST: Live On Compiler Explorer

//#define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted/std_pair.hpp> // for map attributes
#include <boost/phoenix.hpp>
#include <boost/spirit/include/qi.hpp>
#include <fmt/ostream.h>
#include <fmt/ranges.h>
#include <map>
namespace qi = boost::spirit::qi;

auto sample = R"(- from: src
to: 
    - target1
    - target2
- from: src2
to: 
    - target3
    - target4)";

using Key   = std::string;
using Raw   = std::string;
using Value = boost::make_recursive_variant<  //
    Raw,                                      //
    std::map<Key, boost::recursive_variant_>, //
    std::vector<boost::recursive_variant_>>::type;

using List = std::vector<Value>;
using Dict = std::map<Key, Value>;

struct Printer {
    std::ostream& _os;
    std::ostreambuf_iterator<char> _out{_os};
    Printer(std::ostream& os) : _os(os) {}

    template <typename... Ts>
    auto operator()(boost::variant<Ts...> const& v) const { boost::apply_visitor(*this, v); }
    auto operator()(auto const& v) const { return fmt::format_to(_out, "{}", v); }
};

template <> struct fmt::formatter<Value> : ostream_formatter {};

static inline std::ostream& operator<<(std::ostream& os, Value const& v) {
    Printer{os}(v);
    return os;
}

template <typename It> struct Parser : qi::grammar<It, Value()> {
    Parser() : Parser::base_type(start) {
        using namespace qi;
        _r1_type level; // friendly name for inherited attribute

        auto nested    = level + 1;
        linebreak_     = *blank >> eol >> repeat(level)["  "];
        auto linebreak = linebreak_(level);
        auto identchar = copy(char_("a-zA-Z0-9_"));

        key      = (identchar - digit) >> *identchar;
        rawvalue = omit[*blank >> &graph] >> *(char_ - eol);
        entry    = key >> skip(blank)[":"] >> value(nested);
        dict     = entry(level) % linebreak;
        list     = ("- " >> value(nested)) % linebreak;
        value    = *linebreak >> (list(level) | dict(level) | rawvalue);
        start    = value(0);

        BOOST_SPIRIT_DEBUG_NODES(
            (start)(value)(list)(dict)(entry)(rawvalue)(key)/*(linebreak_)*/)
    }

private:
    using Entry = std::pair<Key, Value>;
    qi::rule<It, Value()>    start;
    qi::rule<It, Value(int)> value;
    qi::rule<It, List(int)>  list;
    qi::rule<It, Dict(int)>  dict;
    qi::rule<It, Entry(int)> entry;

    qi::rule<It, Key()>     key;
    qi::rule<It, Raw()>     rawvalue;
    qi::rule<It, void(int)> linebreak_;
};

int main() {
    for (std::string const input : {sample}) {
        auto f = begin(input), l = end(input);
        Parser<decltype(f)> p;

        if (Value v; parse(f, l, p, v)) {
            fmt::print("Parsed: {}\n", v);
        } else {
            fmt::print("Parsed failed\n");
        }

        if (f != l) {
            fmt::print("Remaining: '{}'\n", std::string(f,l));
        }
    }
}

Prints

Parsed: [{"from": src, "to": [target1, target2]}, {"from": src2, "to": [target3, target4]}]

And with BOOST_SPIRIT_DEBUG enabled:

<start>
  <try>- from: src\n  to: \n </try>
  <value>
    <try>- from: src\n  to: \n </try>
    <list>
      <try>- from: src\n  to: \n </try>
      <value>
        <try>from: src\n  to: \n   </try>
        <list>
          <try>from: src\n  to: \n   </try>
          <fail/>
        </list>
        <dict>
          <try>from: src\n  to: \n   </try>
          <entry>
            <try>from: src\n  to: \n   </try>
            <key>
              <try>from: src\n  to: \n   </try>
              <success>: src\n  to: \n    - t</success>
              <attributes>[[f, r, o, m]]</attributes>
            </key>
            <value>
              <try> src\n  to: \n    - ta</try>
              <list>
                <try> src\n  to: \n    - ta</try>
                <fail/>
              </list>
              <dict>
                <try> src\n  to: \n    - ta</try>
                <entry>
                  <try> src\n  to: \n    - ta</try>
                  <key>
                    <try> src\n  to: \n    - ta</try>
                    <fail/>
                  </key>
                  <fail/>
                </entry>
                <fail/>
              </dict>
              <rawvalue>
                <try> src\n  to: \n    - ta</try>
                <success>\n  to: \n    - target</success>
                <attributes>[[s, r, c]]</attributes>
              </rawvalue>
              <success>\n  to: \n    - target</success>
              <attributes>[[s, r, c], 2]</attributes>
            </value>
            <success>\n  to: \n    - target</success>
            <attributes>[[[f, r, o, m], [s, r, c]], 1]</attributes>
          </entry>
          <entry>
            <try>to: \n    - target1\n </try>
            <key>
              <try>to: \n    - target1\n </try>
              <success>: \n    - target1\n   </success>
              <attributes>[[t, o]]</attributes>
            </key>
            <value>
              <try> \n    - target1\n    </try>
              <list>
                <try>- target1\n    - targ</try>
                <value>
                  <try>target1\n    - target</try>
                  <list>
                    <try>target1\n    - target</try>
                    <fail/>
                  </list>
                  <dict>
                    <try>target1\n    - target</try>
                    <entry>
                      <try>target1\n    - target</try>
                      <key>
                        <try>target1\n    - target</try>
                        <success>\n    - target2\n- fro</success>
                        <attributes>[[t, a, r, g, e, t, 1]]</attributes>
                      </key>
                      <fail/>
                    </entry>
                    <fail/>
                  </dict>
                  <rawvalue>
                    <try>target1\n    - target</try>
                    <success>\n    - target2\n- fro</success>
                    <attributes>[[t, a, r, g, e, t, 1]]</attributes>
                  </rawvalue>
                  <success>\n    - target2\n- fro</success>
                  <attributes>[[t, a, r, g, e, t, 1], 3]</attributes>
                </value>
                <value>
                  <try>target2\n- from: src2</try>
                  <list>
                    <try>target2\n- from: src2</try>
                    <fail/>
                  </list>
                  <dict>
                    <try>target2\n- from: src2</try>
                    <entry>
                      <try>target2\n- from: src2</try>
                      <key>
                        <try>target2\n- from: src2</try>
                        <success>\n- from: src2\n  to: </success>
                        <attributes>[[t, a, r, g, e, t, 2]]</attributes>
                      </key>
                      <fail/>
                    </entry>
                    <fail/>
                  </dict>
                  <rawvalue>
                    <try>target2\n- from: src2</try>
                    <success>\n- from: src2\n  to: </success>
                    <attributes>[[t, a, r, g, e, t, 2]]</attributes>
                  </rawvalue>
                  <success>\n- from: src2\n  to: </success>
                  <attributes>[[t, a, r, g, e, t, 2], 3]</attributes>
                </value>
                <success>\n- from: src2\n  to: </success>
                <attributes>[[[t, a, r, g, e, t, 1], [t, a, r, g, e, t, 2]], 2]</attributes>
              </list>
              <success>\n- from: src2\n  to: </success>
              <attributes>[[[t, a, r, g, e, t, 1], [t, a, r, g, e, t, 2]], 2]</attributes>
            </value>
            <success>\n- from: src2\n  to: </success>
            <attributes>[[[t, o], [[t, a, r, g, e, t, 1], [t, a, r, g, e, t, 2]]], 1]</attributes>
          </entry>
          <success>\n- from: src2\n  to: </success>
          <attributes>[[[[f, r, o, m], [s, r, c]], [[t, o], [[t, a, r, g, e, t, 1], [t, a, r, g, e, t, 2]]]], 1]</attributes>
        </dict>
        <success>\n- from: src2\n  to: </success>
        <attributes>[[[[f, r, o, m], [s, r, c]], [[t, o], [[t, a, r, g, e, t, 1], [t, a, r, g, e, t, 2]]]], 1]</attributes>
      </value>
      <value>
        <try>from: src2\n  to: \n  </try>
        <list>
          <try>from: src2\n  to: \n  </try>
          <fail/>
        </list>
        <dict>
          <try>from: src2\n  to: \n  </try>
          <entry>
            <try>from: src2\n  to: \n  </try>
            <key>
              <try>from: src2\n  to: \n  </try>
              <success>: src2\n  to: \n    - </success>
              <attributes>[[f, r, o, m]]</attributes>
            </key>
            <value>
              <try> src2\n  to: \n    - t</try>
              <list>
                <try> src2\n  to: \n    - t</try>
                <fail/>
              </list>
              <dict>
                <try> src2\n  to: \n    - t</try>
                <entry>
                  <try> src2\n  to: \n    - t</try>
                  <key>
                    <try> src2\n  to: \n    - t</try>
                    <fail/>
                  </key>
                  <fail/>
                </entry>
                <fail/>
              </dict>
              <rawvalue>
                <try> src2\n  to: \n    - t</try>
                <success>\n  to: \n    - target</success>
                <attributes>[[s, r, c, 2]]</attributes>
              </rawvalue>
              <success>\n  to: \n    - target</success>
              <attributes>[[s, r, c, 2], 2]</attributes>
            </value>
            <success>\n  to: \n    - target</success>
            <attributes>[[[f, r, o, m], [s, r, c, 2]], 1]</attributes>
          </entry>
          <entry>
            <try>to: \n    - target3\n </try>
            <key>
              <try>to: \n    - target3\n </try>
              <success>: \n    - target3\n   </success>
              <attributes>[[t, o]]</attributes>
            </key>
            <value>
              <try> \n    - target3\n    </try>
              <list>
                <try>- target3\n    - targ</try>
                <value>
                  <try>target3\n    - target</try>
                  <list>
                    <try>target3\n    - target</try>
                    <fail/>
                  </list>
                  <dict>
                    <try>target3\n    - target</try>
                    <entry>
                      <try>target3\n    - target</try>
                      <key>
                        <try>target3\n    - target</try>
                        <success>\n    - target4</success>
                        <attributes>[[t, a, r, g, e, t, 3]]</attributes>
                      </key>
                      <fail/>
                    </entry>
                    <fail/>
                  </dict>
                  <rawvalue>
                    <try>target3\n    - target</try>
                    <success>\n    - target4</success>
                    <attributes>[[t, a, r, g, e, t, 3]]</attributes>
                  </rawvalue>
                  <success>\n    - target4</success>
                  <attributes>[[t, a, r, g, e, t, 3], 3]</attributes>
                </value>
                <value>
                  <try>target4</try>
                  <list>
                    <try>target4</try>
                    <fail/>
                  </list>
                  <dict>
                    <try>target4</try>
                    <entry>
                      <try>target4</try>
                      <key>
                        <try>target4</try>
                        <success></success>
                        <attributes>[[t, a, r, g, e, t, 4]]</attributes>
                      </key>
                      <fail/>
                    </entry>
                    <fail/>
                  </dict>
                  <rawvalue>
                    <try>target4</try>
                    <success></success>
                    <attributes>[[t, a, r, g, e, t, 4]]</attributes>
                  </rawvalue>
                  <success></success>
                  <attributes>[[t, a, r, g, e, t, 4], 3]</attributes>
                </value>
                <success></success>
                <attributes>[[[t, a, r, g, e, t, 3], [t, a, r, g, e, t, 4]], 2]</attributes>
              </list>
              <success></success>
              <attributes>[[[t, a, r, g, e, t, 3], [t, a, r, g, e, t, 4]], 2]</attributes>
            </value>
            <success></success>
            <attributes>[[[t, o], [[t, a, r, g, e, t, 3], [t, a, r, g, e, t, 4]]], 1]</attributes>
          </entry>
          <success></success>
          <attributes>[[[[f, r, o, m], [s, r, c, 2]], [[t, o], [[t, a, r, g, e, t, 3], [t, a, r, g, e, t, 4]]]], 1]</attributes>
        </dict>
        <success></success>
        <attributes>[[[[f, r, o, m], [s, r, c, 2]], [[t, o], [[t, a, r, g, e, t, 3], [t, a, r, g, e, t, 4]]]], 1]</attributes>
      </value>
      <success></success>
      <attributes>[[[[[f, r, o, m], [s, r, c]], [[t, o], [[t, a, r, g, e, t, 1], [t, a, r, g, e, t, 2]]]], [[[f, r, o, m], [s, r, c, 2]], [[t, o], [[t, a, r, g, e, t, 3], [t, a, r, g, e, t, 4]]]]], 0]</attributes>
    </list>
    <success></success>
    <attributes>[[[[[f, r, o, m], [s, r, c]], [[t, o], [[t, a, r, g, e, t, 1], [t, a, r, g, e, t, 2]]]], [[[f, r, o, m], [s, r, c, 2]], [[t, o], [[t, a, r, g, e, t, 3], [t, a, r, g, e, t, 4]]]]], 0]</attributes>
  </value>
  <success></success>
  <attributes>[[[[[f, r, o, m], [s, r, c]], [[t, o], [[t, a, r, g, e, t, 1], [t, a, r, g, e, t, 2]]]], [[[f, r, o, m], [s, r, c, 2]], [[t, o], [[t, a, r, g, e, t, 3], [t, a, r, g, e, t, 4]]]]]]</attributes>
</start>
Parsed: [{"from": src, "to": [target1, target2]}, {"from": src2, "to": [target3, target4]}]

Upvotes: 1

Boost spirit parsing indended list of items

Answers (1)

Related Questions