Reputation: 47
I don't have a whole lot of code to show for this one because I haven't managed to get anything to work, but the high level problem is that I am trying to create a series of parsers for a family of related languages. What I mean by this is that the languages will share many of the same constructs, but there won't be complete overlap. As a simple example, say I have an AST that is parameterized by some (completely contrived in this example) 'leaf' type:
template <typename t>
struct fooT {
std::string name;
t leaf;
};
One language may have t
instantiated as int
and one as double
. What I wanted to do was create a templated class or something that I could instantiate with different t
's and corresponding parser rules so that I could generate a series of composed parsers.
In my real example, I have a bunch of nested structures that are the same across the languages, but only have a couple of small variations at the very edges of the AST, so if I cannot compose the parsers in a good way, I will end up duplicating a bunch of parse rules, AST nodes, etc. I have actually gotten it to work by not putting it in a class and just very carefully arranging my header files and imports so that I can have 'dangling' parser rules with special names that can be assembled. A big downside of this is that I cannot include parsers for the multiple different languages within the same program -- precisely because of the name conflict that arises.
Does anybody have any ideas how I could approach this?
Upvotes: 2
Views: 376
Reputation: 392833
The nice thing about X3 is that you can generate parsers just as easily as you define them in the first place.
E.g.
template <typename T> struct AstNode {
std::string name;
T leaf;
};
Now let's define a generic parser maker:
namespace Generic {
template <typename T> auto leaf = x3::eps(false);
template <> auto leaf<int>
= "0x" >> x3::int_parser<uintmax_t, 16>{};
template <> auto leaf<std::string>
= x3::lexeme['"' >> *~x3::char_('"') >> '"'];
auto no_comment = x3::space;
auto hash_comments = x3::space |
x3::lexeme['#' >> *(x3::char_ - x3::eol)] >> (x3::eol | x3::eoi);
auto c_style_comments = x3::space |
"/*" >> x3::lexeme[*(x3::char_ - "*/")] >> "*/";
auto cxx_style_comments = c_style_comments |
x3::lexeme["//" >> *(x3::char_ - x3::eol)] >> (x3::eol | x3::eoi);
auto name = leaf<std::string>;
template <typename T> auto parseNode(auto heading, auto skipper) {
return x3::skip(skipper)[
x3::as_parser(heading) >> name >> ":" >> leaf<T>
];
}
}
This allows us to compose various grammars with various leaf types and skipper styles:
namespace Language1 {
static auto const grammar =
Generic::parseNode<int>("value", Generic::no_comment);
}
namespace Language2 {
static auto const grammar =
Generic::parseNode<std::string>("line", Generic::cxx_style_comments);
}
Let's Demo:
#include <boost/spirit/home/x3.hpp>
#include <boost/fusion/adapted.hpp>
#include <iomanip>
namespace x3 = boost::spirit::x3;
template <typename T> struct AstNode {
std::string name;
T leaf;
};
BOOST_FUSION_ADAPT_TPL_STRUCT((T), (AstNode)(T), name, leaf)
namespace Generic {
template <typename T> auto leaf = x3::eps(false);
template <> auto leaf<int>
= "0x" >> x3::uint_parser<uintmax_t, 16>{};
template <> auto leaf<std::string>
= x3::lexeme['"' >> *~x3::char_('"') >> '"'];
auto no_comment = x3::space;
auto hash_comments = x3::space |
x3::lexeme['#' >> *(x3::char_ - x3::eol)] >> (x3::eol | x3::eoi);
auto c_style_comments = x3::space |
"/*" >> x3::lexeme[*(x3::char_ - "*/")] >> "*/";
auto cxx_style_comments = c_style_comments |
x3::lexeme["//" >> *(x3::char_ - x3::eol)] >> (x3::eol | x3::eoi);
auto name = leaf<std::string>;
template <typename T> auto parseNode(auto heading, auto skipper) {
return x3::skip(skipper)[
x3::as_parser(heading) >> name >> ":" >> leaf<T>
];
}
}
namespace Language1 {
static auto const grammar =
Generic::parseNode<int>("value", Generic::no_comment);
}
namespace Language2 {
static auto const grammar =
Generic::parseNode<std::string>("line", Generic::cxx_style_comments);
}
void test(auto const& grammar, std::string_view text, auto ast) {
auto f = text.begin(), l = text.end();
std::cout << "\nParsing: " << std::quoted(text, '\'') << "\n";
if (parse(f, l, grammar, ast)) {
std::cout << " -> {name:" << ast.name << ",value:" << ast.leaf << "}\n";
} else {
std::cout << " -- Failed " << std::quoted(text, '\'') << "\n";
}
}
int main() {
test(Language1::grammar, R"(value "one": 0x01)", AstNode<int>{});
test(
Language2::grammar,
R"(line "Hamlet": "There is nothing either good or bad, but thinking makes it so.")",
AstNode<std::string>{});
test(
Language2::grammar,
R"(line // rejected: "Hamlet": "To be ..."
"King Lear": /*hopefully less trite:*/"As flies to wanton boys are we to the gods")",
AstNode<std::string>{});
}
Prints
Parsing: 'value "one": 0x01'
-> {name:one,value:1}
Parsing: 'line "Hamlet": "There is nothing either good or bad, but thinking makes it so."'
-> {name:Hamlet,value:There is nothing either good or bad, but thinking makes it so.}
Parsing: 'line // rejected: "Hamlet": "To be ..."
"King Lear": /*hopefully less trite:*/"As flies to wanton boys are we to the gods"'
-> {name:King Lear,value:As flies to wanton boys are we to the gods}
For advanced scenarios (where you have separation of rule declaration and definitions across trnalsation units and/or you require dynamic switching), you can use the x3::any_rule<>
holder.
Upvotes: 1