Reputation: 153
I need to parse something like (yaml):
- from: src
to:
- target1
- target2
- from: src2
to:
- target3
- target4
I tried something like (simplified pseudo)
identifierRule = +alnum;
fromToRule = lit("-") >>
(
"from:" >> identifierRule >> qi::eol >>
(
("to: " >> qi::eol >> +(qi::repeat(indention)[qi::blank] >> "-" >> identifierRule >> qi::eol))
);
But with this approach the second 'from' entry is parsed as an additional entry of the first 'to' entries and not as a new seperate entry. Is there any way to retrieve the current indention level and use this as an additional rule information?
Upvotes: 2
Views: 165
Reputation: 392999
Of course you should be using a YAML library (e.g. yaml-cpp), because YAML is much more versatile and ... riddled with parser idiosyncrasies. Don't roll your own.
However, assuming you're trying to learn Spirit Qi, there's merit to the question.
It's not at all trivial though, and a lot of it depends on what you want to be parsing into. Focusing only on the input shown, I'd imagine an AST like:
using Key = std::string;
using Raw = std::string;
using Value = boost::make_recursive_variant< //
Raw, //
std::map<Key, boost::recursive_variant_>, //
std::vector<boost::recursive_variant_>>::type;
using List = std::vector<Value>;
using Dict = std::map<Key, Value>;
So, "- a\n- b"
would be a list, "a: b\nc: d"
would be a dict and anything else is a raw value.
To be able to nest, let's create rules that are parameterized by a level number:
using Entry = std::pair<Key, Value>;
qi::rule<It, Value()> start;
qi::rule<It, Value(int)> value;
qi::rule<It, List(int)> list;
qi::rule<It, Dict(int)> dict;
qi::rule<It, Entry(int)> entry;
qi::rule<It, Key()> key;
qi::rule<It, Raw()> rawvalue;
qi::rule<It, void(int)> linebreak_;
Only key
and rawvalue
never contain a newline, so don't need the parameter. linebreak_
doesn't expose attributes, but is made a rule so we could enable debug output for it.
Now, leaning on a lot of experience I might write the rules as follows:
using namespace qi;
_r1_type level; // friendly name for inherited attribute
auto nested = level + 1;
First things first, so we can keep it "readable". Right away, some of the helpers:
linebreak_ = *blank >> eol >> repeat(level)[" "];
auto linebreak = linebreak_(level);
auto identchar = copy(char_("a-zA-Z0-9_"));
We help ourselves with shorthands, so we don't have to repeat ourselves. However note the subtle presence of qi::copy
(which is proto::deep_copy
, see e.g. Assigning parsers to auto variables).
Now, we can have the rules pretty much "naively":
key = (identchar - digit) >> *identchar;
rawvalue = omit[*blank >> &graph] >> *(char_ - eol);
The vagueness going on here is the unspecified omission of blank space at the beginning of raw values. Now, let's continue top-down for level
-aware productions:
start = value(0);
value = *linebreak >> (list(level) | dict(level) | rawvalue);
We start with list, because it's most recognizable by it's "- "
prefix:
list = ("- " >> value(nested)) % linebreak;
Remember nested
is just the Phoenix expression for level + 1
.
dict = entry(level) % linebreak;
Dicts keep the same level for all entries.
entry = key >> skip(blank)[":"] >> value(nested);
Note we tolerate insignificant blank space around :
.
Everything rolled together:
template <typename It> struct Parser : qi::grammar<It, Value()> {
Parser() : Parser::base_type(start) {
using namespace qi;
_r1_type level; // friendly name for inherited attribute
auto nested = level + 1;
linebreak_ = *blank >> eol >> repeat(level)[" "];
auto linebreak = linebreak_(level);
auto identchar = copy(char_("a-zA-Z0-9_"));
key = (identchar - digit) >> *identchar;
rawvalue = omit[*blank >> &graph] >> *(char_ - eol);
entry = key >> skip(blank)[":"] >> value(nested);
dict = entry(level) % linebreak;
list = ("- " >> value(nested)) % linebreak;
value = *linebreak >> (list(level) | dict(level) | rawvalue);
start = value(0);
BOOST_SPIRIT_DEBUG_NODES(
(start)(value)(list)(dict)(entry)(rawvalue)(key)/*(linebreak_)*/)
}
private:
using Entry = std::pair<Key, Value>;
qi::rule<It, Value()> start;
qi::rule<It, Value(int)> value;
qi::rule<It, List(int)> list;
qi::rule<It, Dict(int)> dict;
qi::rule<It, Entry(int)> entry;
qi::rule<It, Key()> key;
qi::rule<It, Raw()> rawvalue;
qi::rule<It, void(int)> linebreak_;
};
Adding minimal code to print the resulting AST: Live On Compiler Explorer
//#define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted/std_pair.hpp> // for map attributes
#include <boost/phoenix.hpp>
#include <boost/spirit/include/qi.hpp>
#include <fmt/ostream.h>
#include <fmt/ranges.h>
#include <map>
namespace qi = boost::spirit::qi;
auto sample = R"(- from: src
to:
- target1
- target2
- from: src2
to:
- target3
- target4)";
using Key = std::string;
using Raw = std::string;
using Value = boost::make_recursive_variant< //
Raw, //
std::map<Key, boost::recursive_variant_>, //
std::vector<boost::recursive_variant_>>::type;
using List = std::vector<Value>;
using Dict = std::map<Key, Value>;
struct Printer {
std::ostream& _os;
std::ostreambuf_iterator<char> _out{_os};
Printer(std::ostream& os) : _os(os) {}
template <typename... Ts>
auto operator()(boost::variant<Ts...> const& v) const { boost::apply_visitor(*this, v); }
auto operator()(auto const& v) const { return fmt::format_to(_out, "{}", v); }
};
template <> struct fmt::formatter<Value> : ostream_formatter {};
static inline std::ostream& operator<<(std::ostream& os, Value const& v) {
Printer{os}(v);
return os;
}
template <typename It> struct Parser : qi::grammar<It, Value()> {
Parser() : Parser::base_type(start) {
using namespace qi;
_r1_type level; // friendly name for inherited attribute
auto nested = level + 1;
linebreak_ = *blank >> eol >> repeat(level)[" "];
auto linebreak = linebreak_(level);
auto identchar = copy(char_("a-zA-Z0-9_"));
key = (identchar - digit) >> *identchar;
rawvalue = omit[*blank >> &graph] >> *(char_ - eol);
entry = key >> skip(blank)[":"] >> value(nested);
dict = entry(level) % linebreak;
list = ("- " >> value(nested)) % linebreak;
value = *linebreak >> (list(level) | dict(level) | rawvalue);
start = value(0);
BOOST_SPIRIT_DEBUG_NODES(
(start)(value)(list)(dict)(entry)(rawvalue)(key)/*(linebreak_)*/)
}
private:
using Entry = std::pair<Key, Value>;
qi::rule<It, Value()> start;
qi::rule<It, Value(int)> value;
qi::rule<It, List(int)> list;
qi::rule<It, Dict(int)> dict;
qi::rule<It, Entry(int)> entry;
qi::rule<It, Key()> key;
qi::rule<It, Raw()> rawvalue;
qi::rule<It, void(int)> linebreak_;
};
int main() {
for (std::string const input : {sample}) {
auto f = begin(input), l = end(input);
Parser<decltype(f)> p;
if (Value v; parse(f, l, p, v)) {
fmt::print("Parsed: {}\n", v);
} else {
fmt::print("Parsed failed\n");
}
if (f != l) {
fmt::print("Remaining: '{}'\n", std::string(f,l));
}
}
}
Prints
Parsed: [{"from": src, "to": [target1, target2]}, {"from": src2, "to": [target3, target4]}]
And with BOOST_SPIRIT_DEBUG
enabled:
<start>
<try>- from: src\n to: \n </try>
<value>
<try>- from: src\n to: \n </try>
<list>
<try>- from: src\n to: \n </try>
<value>
<try>from: src\n to: \n </try>
<list>
<try>from: src\n to: \n </try>
<fail/>
</list>
<dict>
<try>from: src\n to: \n </try>
<entry>
<try>from: src\n to: \n </try>
<key>
<try>from: src\n to: \n </try>
<success>: src\n to: \n - t</success>
<attributes>[[f, r, o, m]]</attributes>
</key>
<value>
<try> src\n to: \n - ta</try>
<list>
<try> src\n to: \n - ta</try>
<fail/>
</list>
<dict>
<try> src\n to: \n - ta</try>
<entry>
<try> src\n to: \n - ta</try>
<key>
<try> src\n to: \n - ta</try>
<fail/>
</key>
<fail/>
</entry>
<fail/>
</dict>
<rawvalue>
<try> src\n to: \n - ta</try>
<success>\n to: \n - target</success>
<attributes>[[s, r, c]]</attributes>
</rawvalue>
<success>\n to: \n - target</success>
<attributes>[[s, r, c], 2]</attributes>
</value>
<success>\n to: \n - target</success>
<attributes>[[[f, r, o, m], [s, r, c]], 1]</attributes>
</entry>
<entry>
<try>to: \n - target1\n </try>
<key>
<try>to: \n - target1\n </try>
<success>: \n - target1\n </success>
<attributes>[[t, o]]</attributes>
</key>
<value>
<try> \n - target1\n </try>
<list>
<try>- target1\n - targ</try>
<value>
<try>target1\n - target</try>
<list>
<try>target1\n - target</try>
<fail/>
</list>
<dict>
<try>target1\n - target</try>
<entry>
<try>target1\n - target</try>
<key>
<try>target1\n - target</try>
<success>\n - target2\n- fro</success>
<attributes>[[t, a, r, g, e, t, 1]]</attributes>
</key>
<fail/>
</entry>
<fail/>
</dict>
<rawvalue>
<try>target1\n - target</try>
<success>\n - target2\n- fro</success>
<attributes>[[t, a, r, g, e, t, 1]]</attributes>
</rawvalue>
<success>\n - target2\n- fro</success>
<attributes>[[t, a, r, g, e, t, 1], 3]</attributes>
</value>
<value>
<try>target2\n- from: src2</try>
<list>
<try>target2\n- from: src2</try>
<fail/>
</list>
<dict>
<try>target2\n- from: src2</try>
<entry>
<try>target2\n- from: src2</try>
<key>
<try>target2\n- from: src2</try>
<success>\n- from: src2\n to: </success>
<attributes>[[t, a, r, g, e, t, 2]]</attributes>
</key>
<fail/>
</entry>
<fail/>
</dict>
<rawvalue>
<try>target2\n- from: src2</try>
<success>\n- from: src2\n to: </success>
<attributes>[[t, a, r, g, e, t, 2]]</attributes>
</rawvalue>
<success>\n- from: src2\n to: </success>
<attributes>[[t, a, r, g, e, t, 2], 3]</attributes>
</value>
<success>\n- from: src2\n to: </success>
<attributes>[[[t, a, r, g, e, t, 1], [t, a, r, g, e, t, 2]], 2]</attributes>
</list>
<success>\n- from: src2\n to: </success>
<attributes>[[[t, a, r, g, e, t, 1], [t, a, r, g, e, t, 2]], 2]</attributes>
</value>
<success>\n- from: src2\n to: </success>
<attributes>[[[t, o], [[t, a, r, g, e, t, 1], [t, a, r, g, e, t, 2]]], 1]</attributes>
</entry>
<success>\n- from: src2\n to: </success>
<attributes>[[[[f, r, o, m], [s, r, c]], [[t, o], [[t, a, r, g, e, t, 1], [t, a, r, g, e, t, 2]]]], 1]</attributes>
</dict>
<success>\n- from: src2\n to: </success>
<attributes>[[[[f, r, o, m], [s, r, c]], [[t, o], [[t, a, r, g, e, t, 1], [t, a, r, g, e, t, 2]]]], 1]</attributes>
</value>
<value>
<try>from: src2\n to: \n </try>
<list>
<try>from: src2\n to: \n </try>
<fail/>
</list>
<dict>
<try>from: src2\n to: \n </try>
<entry>
<try>from: src2\n to: \n </try>
<key>
<try>from: src2\n to: \n </try>
<success>: src2\n to: \n - </success>
<attributes>[[f, r, o, m]]</attributes>
</key>
<value>
<try> src2\n to: \n - t</try>
<list>
<try> src2\n to: \n - t</try>
<fail/>
</list>
<dict>
<try> src2\n to: \n - t</try>
<entry>
<try> src2\n to: \n - t</try>
<key>
<try> src2\n to: \n - t</try>
<fail/>
</key>
<fail/>
</entry>
<fail/>
</dict>
<rawvalue>
<try> src2\n to: \n - t</try>
<success>\n to: \n - target</success>
<attributes>[[s, r, c, 2]]</attributes>
</rawvalue>
<success>\n to: \n - target</success>
<attributes>[[s, r, c, 2], 2]</attributes>
</value>
<success>\n to: \n - target</success>
<attributes>[[[f, r, o, m], [s, r, c, 2]], 1]</attributes>
</entry>
<entry>
<try>to: \n - target3\n </try>
<key>
<try>to: \n - target3\n </try>
<success>: \n - target3\n </success>
<attributes>[[t, o]]</attributes>
</key>
<value>
<try> \n - target3\n </try>
<list>
<try>- target3\n - targ</try>
<value>
<try>target3\n - target</try>
<list>
<try>target3\n - target</try>
<fail/>
</list>
<dict>
<try>target3\n - target</try>
<entry>
<try>target3\n - target</try>
<key>
<try>target3\n - target</try>
<success>\n - target4</success>
<attributes>[[t, a, r, g, e, t, 3]]</attributes>
</key>
<fail/>
</entry>
<fail/>
</dict>
<rawvalue>
<try>target3\n - target</try>
<success>\n - target4</success>
<attributes>[[t, a, r, g, e, t, 3]]</attributes>
</rawvalue>
<success>\n - target4</success>
<attributes>[[t, a, r, g, e, t, 3], 3]</attributes>
</value>
<value>
<try>target4</try>
<list>
<try>target4</try>
<fail/>
</list>
<dict>
<try>target4</try>
<entry>
<try>target4</try>
<key>
<try>target4</try>
<success></success>
<attributes>[[t, a, r, g, e, t, 4]]</attributes>
</key>
<fail/>
</entry>
<fail/>
</dict>
<rawvalue>
<try>target4</try>
<success></success>
<attributes>[[t, a, r, g, e, t, 4]]</attributes>
</rawvalue>
<success></success>
<attributes>[[t, a, r, g, e, t, 4], 3]</attributes>
</value>
<success></success>
<attributes>[[[t, a, r, g, e, t, 3], [t, a, r, g, e, t, 4]], 2]</attributes>
</list>
<success></success>
<attributes>[[[t, a, r, g, e, t, 3], [t, a, r, g, e, t, 4]], 2]</attributes>
</value>
<success></success>
<attributes>[[[t, o], [[t, a, r, g, e, t, 3], [t, a, r, g, e, t, 4]]], 1]</attributes>
</entry>
<success></success>
<attributes>[[[[f, r, o, m], [s, r, c, 2]], [[t, o], [[t, a, r, g, e, t, 3], [t, a, r, g, e, t, 4]]]], 1]</attributes>
</dict>
<success></success>
<attributes>[[[[f, r, o, m], [s, r, c, 2]], [[t, o], [[t, a, r, g, e, t, 3], [t, a, r, g, e, t, 4]]]], 1]</attributes>
</value>
<success></success>
<attributes>[[[[[f, r, o, m], [s, r, c]], [[t, o], [[t, a, r, g, e, t, 1], [t, a, r, g, e, t, 2]]]], [[[f, r, o, m], [s, r, c, 2]], [[t, o], [[t, a, r, g, e, t, 3], [t, a, r, g, e, t, 4]]]]], 0]</attributes>
</list>
<success></success>
<attributes>[[[[[f, r, o, m], [s, r, c]], [[t, o], [[t, a, r, g, e, t, 1], [t, a, r, g, e, t, 2]]]], [[[f, r, o, m], [s, r, c, 2]], [[t, o], [[t, a, r, g, e, t, 3], [t, a, r, g, e, t, 4]]]]], 0]</attributes>
</value>
<success></success>
<attributes>[[[[[f, r, o, m], [s, r, c]], [[t, o], [[t, a, r, g, e, t, 1], [t, a, r, g, e, t, 2]]]], [[[f, r, o, m], [s, r, c, 2]], [[t, o], [[t, a, r, g, e, t, 3], [t, a, r, g, e, t, 4]]]]]]</attributes>
</start>
Parsed: [{"from": src, "to": [target1, target2]}, {"from": src2, "to": [target3, target4]}]
Upvotes: 1