Reputation: 9411
I am new to Boost Spirit and is struggling to create a proper expression to parse the following input (actually a result of a stdout of some command):
^+ line-17532.dyn.kponet.fi 2 7 377 1 +1503us[+9103us] +/- 55ms
Which I need to parse into a set of strings and integers and recorded in variables. Most of the line should be just parsed into a variable of appropriate type (string or int). So in the end, I get:
string: "^+", "line-17532.dyn.kponet.fi", "+1503us", "+9103us", "55ms"
int : 2, 7, 377, 1
The pair
+1503us[+9103us]
can also be with space
+503us[ +103us]
and I need stuff before square brackets and in square brackets to be placed in separate strings.
additionally, time designations can be expressed as
ns, ms, us, s
I appreciate examples about how to deal with it, because the available documentation is quite sparse and not cohesive.
Large piece of the log, along with headings describing the individual fields:
MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
^+ ns2.sdi.fi 2 9 377 381 -1476us[-1688us] +/- 72ms
^+ line-17532.dyn.kponet.fi 2 10 377 309 +302us[ +302us] +/- 59ms
^* heh.fi 2 10 377 319 -1171us[-1387us] +/- 50ms
^+ stara.mulimuli.fi 3 10 377 705 -1253us[-1446us] +/- 73ms
Upvotes: 4
Views: 1115
Reputation: 490178
This is one of those times I can almost feel some sympathy for people who claim that C++ has just added complexity, and C was really better. It does lose some things like type safety, but consider what reading this looks like with C's scanf
:
struct record {
char prefix[256];
char url[256];
int a, b, c, d;
char time1[256];
char time2[256];
char time3[256];
};
sscanf(input,
"%255s %255s %d %d %d %d %255[^[][ %255[^]]] +/- %255s",
r.prefix, r.url, &r.a, &r.b, &r.c, &r.d, r.time1, r.time2, r.time3);
This does, of course, have a few potential liabilities:
std::string
s.scanf
and cousins aren't type safe.If any of these is really a serious problem for your purposes, you might really need a different approach. Given what it looks like the code is probably intended to do, it's not immediately obvious that any of them is likely to cause a real problem though.
Upvotes: 2
Reputation: 19041
Note: This answer shows a simpler approach, forming a foundation for additional techniques shown by sehe.
Let's enable Spirit debug output, so we can follow the progress of our parses while we're developing them.
#define BOOST_SPIRIT_DEBUG 1
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
namespace qi = boost::spirit::qi;
The first step would be to define a structure to hold out parsed log entries.
struct log_entry_t
{
std::string element_0;
std::string element_1;
uint32_t element_2;
uint32_t element_3;
uint32_t element_4;
uint32_t element_5;
std::string element_6;
std::string element_7;
std::string element_8;
};
In order to be able to use the structure as an attribute of a Spirit grammar, we need to adapt it into a fusion tuple. (More info is in one of Spirit tutorials) This is achieved using BOOST_FUSION_ADAPT_STRUCT
.
BOOST_FUSION_ADAPT_STRUCT(
log_entry_t
, (std::string, element_0)
, (std::string, element_1)
, (uint32_t, element_2)
, (uint32_t, element_3)
, (uint32_t, element_4)
, (uint32_t, element_5)
, (std::string, element_6)
, (std::string, element_7)
, (std::string, element_8)
)
Next, we define the grammar for the log entry. Since the individual entries may be separated by whitespace, we want to use phrase parsing, and thus need to specify a skip parser. qi::blank_type
is an appropriate skipper, since it matches spaces and tabs only.
However, all of the elements should be treated as lexemes, we do not specify any skipper for their rules.
template <typename Iterator>
struct log_line_parser
: qi::grammar<Iterator, log_entry_t(), qi::blank_type>
{
typedef qi::blank_type skipper_t;
log_line_parser()
: log_line_parser::base_type(log_line)
{
element_0 %= qi::string("^+");
element_1 %= qi::raw[(+qi::char_("-a-zA-Z0-9") % qi::char_('.'))];
element_2 %= qi::uint_;
element_3 %= qi::uint_;
element_4 %= qi::uint_;
element_5 %= qi::uint_;
element_6 %= qi::raw[qi::char_('+') >> qi::uint_ >> time_unit];
element_7 %= qi::raw[qi::char_('+') >> qi::uint_ >> time_unit];
element_8 %= qi::raw[qi::uint_ >> time_unit];
time_unit %= -qi::char_("nmu") >> qi::char_('s');
log_line
%= element_0
>> element_1
>> element_2
>> element_3
>> element_4
>> element_5
>> element_6
>> qi::lit('[') >> element_7 >> qi::lit(']')
>> qi::lit("+/-")
>> element_8
;
init_debug();
}
void init_debug()
{
BOOST_SPIRIT_DEBUG_NODE(element_0);
BOOST_SPIRIT_DEBUG_NODE(element_1);
BOOST_SPIRIT_DEBUG_NODE(element_2);
BOOST_SPIRIT_DEBUG_NODE(element_3);
BOOST_SPIRIT_DEBUG_NODE(element_4);
BOOST_SPIRIT_DEBUG_NODE(element_5);
BOOST_SPIRIT_DEBUG_NODE(element_6);
BOOST_SPIRIT_DEBUG_NODE(element_7);
BOOST_SPIRIT_DEBUG_NODE(element_8);
BOOST_SPIRIT_DEBUG_NODE(time_unit);
BOOST_SPIRIT_DEBUG_NODE(log_line);
}
private:
qi::rule<Iterator, std::string()> element_0;
qi::rule<Iterator, std::string()> element_1;
qi::rule<Iterator, uint32_t()> element_2;
qi::rule<Iterator, uint32_t()> element_3;
qi::rule<Iterator, uint32_t()> element_4;
qi::rule<Iterator, uint32_t()> element_5;
qi::rule<Iterator, std::string()> element_6;
qi::rule<Iterator, std::string()> element_7;
qi::rule<Iterator, std::string()> element_8;
qi::rule<Iterator, std::string()> time_unit;
qi::rule<Iterator, log_entry_t(), skipper_t> log_line;
};
Let's go through some of the rules:
Element 0 - this is a simple string we need to match. Since we wish to capture it as well, we need to use the string
parser.
Element 1 - We can use the char_
parser to match either a single character or a character set. The +
parser operator represents repetition, and the %
(list) parser operator let's us parse several repetitions separated by a separator (in our case a dot).
Element 2 - To parse numbers, we can use existing numeric parsers.
Element 6 - Since we want to capture the whole sequence in a string, we use the raw
parser directive
In order to determine the resulting attribute type when using parser operators, refer to the reference of compound attribute rules.
bool test(std::string const& log)
{
std::cout << "Parsing: " << log << "\n\n";
std::string::const_iterator iter(log.begin());
std::string::const_iterator end(log.end());
log_line_parser<std::string::const_iterator> g;
log_entry_t entry;
bool r(qi::phrase_parse(iter, end, g, qi::blank, entry));
std::cout << "-------------------------\n";
if (r && (iter == end)) {
std::cout << "Parsing succeeded\n";
std::cout << entry.element_0 << "\n"
<< entry.element_1 << "\n"
<< entry.element_2 << "\n"
<< entry.element_3 << "\n"
<< entry.element_4 << "\n"
<< entry.element_5 << "\n"
<< entry.element_6 << "\n"
<< entry.element_7 << "\n"
<< entry.element_8 << "\n";
} else {
std::string::const_iterator some = iter + 30;
std::string context(iter, (some > end) ? end : some);
std::cout << "Parsing failed\n";
std::cout << "stopped at: \": " << context << "...\"\n";
}
return r;
}
Finally, let's run a few positive and negative tests on our parser.
int main()
{
bool result(true);
result &= test("^+ line-17532.dyn.kponet.fi 2 7 377 1 +1503us[+9103us] +/- 55ms");
result &= test("^+ line-17532.dyn.kponet.fi 2 7 377 1 +1503us[ +9103us] +/- 55ms");
result &= test("^+ line-17532.dyn.kponet.fi 2 7 377 1 +1503ms[+9103ns] +/- 55s");
result &= !test("^- line-17532.dyn.kponet.fi 2 7 377 1 +1503us[+9103us] +/- 55ms");
result &= !test("^+ line-17532.dyn.kponet.fi 2 7 377 1 +1503us[+9103us] +/- 55 ms");
result &= !test("^+ line-17532.dyn.kponet.fi 2 7 377 1 + 1503us[+9103us] +/- 55ms");
result &= !test("^+ line-17532.dyn.kponet.fi 2 7 +377 1 +1503us[+9103us] +/- 55ms");
result &= !test("^+ line-17532.dyn.kponet.fi 2 7 3 77 1 +1503us[+9103us] +/- 55ms");
result &= !test("^+ line-17532.dyn.kponet.fi 2 7 -377 1 +1503us[+9103us] +/- 55ms");
std::cout << "Test result = " << result << "\n";
return 0;
}
After a lot of debugging output (example for the first test):
Parsing: ^+ line-17532.dyn.kponet.fi 2 7 377 1 +1503us[+9103us] +/- 55ms
<log_line>
<try>^+ line-17532.dyn.kp</try>
<element_0>
<try>^+ line-17532.dyn.kp</try>
<success> line-17532.dyn.kpon</success>
<attributes>[[^, +]]</attributes>
</element_0>
<element_1>
<try>line-17532.dyn.kpone</try>
<success> 2 7 377 </success>
<attributes>[[l, i, n, e, -, 1, 7, 5, 3, 2, ., d, y, n, ., k, p, o, n, e, t, ., f, i]]</attributes>
</element_1>
<element_2>
<try>2 7 377 1 </try>
<success> 7 377 1 +</success>
<attributes>[2]</attributes>
</element_2>
<element_3>
<try>7 377 1 +150</try>
<success> 377 1 +1503</success>
<attributes>[7]</attributes>
</element_3>
<element_4>
<try>377 1 +1503us[</try>
<success> 1 +1503us[+91</success>
<attributes>[377]</attributes>
</element_4>
<element_5>
<try>1 +1503us[+9103us]</try>
<success> +1503us[+9103us] </success>
<attributes>[1]</attributes>
</element_5>
<element_6>
<try>+1503us[+9103us] +/-</try>
<time_unit>
<try>us[+9103us] +/- 55</try>
<success>[+9103us] +/- 55ms</success>
<attributes>[[u, s]]</attributes>
</time_unit>
<success>[+9103us] +/- 55ms</success>
<attributes>[[+, 1, 5, 0, 3, u, s]]</attributes>
</element_6>
<element_7>
<try>+9103us] +/- 55ms</try>
<time_unit>
<try>us] +/- 55ms</try>
<success>] +/- 55ms</success>
<attributes>[[u, s]]</attributes>
</time_unit>
<success>] +/- 55ms</success>
<attributes>[[+, 9, 1, 0, 3, u, s]]</attributes>
</element_7>
<element_8>
<try>55ms</try>
<time_unit>
<try>ms</try>
<success></success>
<attributes>[[m, s]]</attributes>
</time_unit>
<success></success>
<attributes>[[5, 5, m, s]]</attributes>
</element_8>
<success></success>
<attributes>[[[^, +], [l, i, n, e, -, 1, 7, 5, 3, 2, ., d, y, n, ., k, p, o, n, e, t, ., f, i], 2, 7, 377, 1, [+, 1, 5, 0, 3, u, s], [+, 9, 1, 0, 3, u, s], [5, 5, m, s]]]</attributes>
</log_line>
-------------------------
Parsing succeeded
^+
line-17532.dyn.kponet.fi
2
7
377
1
+1503us
+9103us
55ms
the program prints the following line:
Test result = 1
Upvotes: 3
Reputation: 393134
As always I start with sketching a useful AST:
namespace AST {
using clock = std::chrono::high_resolution_clock;
struct TimeSample {
enum Direction { up, down } direction; // + or -
clock::duration value;
};
struct Record {
std::string prefix; // "^+"
std::string fqdn; // "line-17532.dyn.kponet.fi"
int a, b, c, d; // 2, 7, 377, 1
TimeSample primary, braced;
clock::duration tolerance;
};
}
Now that we know what we want to parse, we mostly just mimick the AST with rules, for a bit:
using namespace qi;
start = skip(blank) [record_];
record_ = prefix_ >> fqdn_ >> int_ >> int_ >> int_ >> int_ >> sample_ >> '[' >> sample_ >> ']' >> tolerance_;
prefix_ = string("^+"); // or whatever you need to match here
fqdn_ = +graph; // or whatever additional constraints you have
sample_ = direction_ >> duration_;
duration_ = (long_ >> units_) [ _val = _1 * _2 ];
tolerance_= "+/-" >> duration_;
Of course, the interesting bits are the units and the direction:
struct directions : qi::symbols<char, AST::TimeSample::Direction> {
directions() { add("+", AST::TimeSample::up)("-", AST::TimeSample::down); }
} direction_;
struct units : qi::symbols<char, AST::clock::duration> {
units() {
using namespace std::literals::chrono_literals;
add("s", 1s)("ms", 1ms)("us", 1us)("µs", 1us)("ns", 1ns);
}
} units_;
The white-space acceptance is governed by a skipper; I chose qi::blank_type
for the non-lexeme rules:
using Skipper = qi::blank_type;
qi::rule<It, AST::Record()> start;
qi::rule<It, AST::Record(), Skipper> record_;
qi::rule<It, AST::TimeSample(), Skipper> sample_;
qi::rule<It, AST::clock::duration(), Skipper> duration_, tolerance_;
// lexemes:
qi::rule<It, std::string()> prefix_;
qi::rule<It, std::string()> fqdn_;
Putting it all together, use it:
int main() {
std::istringstream iss(R"(^+ line-17532.dyn.kponet.fi 2 7 377 1 +1503us[+9103us] +/- 55ms
)");
std::string line;
while (getline(iss, line)) {
auto f = line.cbegin(), l = line.cend();
AST::Record record;
if (parse(f, l, parser<>{}, record))
std::cout << "parsed: " << boost::fusion::as_vector(record) << "\n";
else
std::cout << "parse error\n";
if (f!=l)
std::cout << "remaining unparsed input: '" << std::string(f,l) << "'\n";
}
}
Which prints: Live On Coliru
parsed: (^+ line-17532.dyn.kponet.fi 2 7 377 1 +0.001503s +0.009103s 0.055s)
(debug output below)
#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/fusion/adapted.hpp>
#include <sstream>
#include <chrono>
namespace std { namespace chrono {
// for debug
std::ostream& operator<<(std::ostream& os, duration<double> d) { return os << d.count() << "s"; }
} }
namespace AST {
using clock = std::chrono::high_resolution_clock;
struct TimeSample {
enum Direction { up, down } direction; // + or -
clock::duration value;
// for debug:
friend std::ostream& operator<<(std::ostream& os, Direction d) {
char const* signs[] = {"+","-"};
return os << signs[d];
}
friend std::ostream& operator<<(std::ostream& os, TimeSample const& sample) {
return os << sample.direction << std::chrono::duration<double>(sample.value).count() << "s";
}
};
struct Record {
std::string prefix; // "^+"
std::string fqdn; // "line-17532.dyn.kponet.fi"
int a, b, c, d; // 2, 7, 377, 1
TimeSample primary, braced;
clock::duration tolerance;
};
}
BOOST_FUSION_ADAPT_STRUCT(AST::Record, prefix, fqdn, a, b, c, d, primary, braced, tolerance)
BOOST_FUSION_ADAPT_STRUCT(AST::TimeSample, direction, value)
namespace qi = boost::spirit::qi;
template <typename It = std::string::const_iterator>
struct parser : qi::grammar<It, AST::Record()> {
parser() : parser::base_type(start) {
using namespace qi;
start = skip(blank) [record_];
record_ = prefix_ >> fqdn_ >> int_ >> int_ >> int_ >> int_ >> sample_ >> '[' >> sample_ >> ']' >> tolerance_;
prefix_ = string("^+"); // or whatever you need to match here
fqdn_ = +graph; // or whatever additional constraints you have
sample_ = direction_ >> duration_;
duration_ = (long_ >> units_) [ _val = _1 * _2 ];
tolerance_= "+/-" >> duration_;
BOOST_SPIRIT_DEBUG_NODES(
(start)(record_)
(prefix_)(fqdn_)(sample_)(duration_)(tolerance_)
)
}
private:
struct directions : qi::symbols<char, AST::TimeSample::Direction> {
directions() { add("+", AST::TimeSample::up)("-", AST::TimeSample::down); }
} direction_;
struct units : qi::symbols<char, AST::clock::duration> {
units() {
using namespace std::literals::chrono_literals;
add("s", 1s)("ms", 1ms)("us", 1us)("µs", 1us)("ns", 1ns);
}
} units_;
using Skipper = qi::blank_type;
qi::rule<It, AST::Record()> start;
qi::rule<It, AST::Record(), Skipper> record_;
qi::rule<It, AST::TimeSample(), Skipper> sample_;
qi::rule<It, AST::clock::duration(), Skipper> duration_, tolerance_;
// lexemes:
qi::rule<It, std::string()> prefix_;
qi::rule<It, std::string()> fqdn_;
};
int main() {
std::istringstream iss(R"(^+ line-17532.dyn.kponet.fi 2 7 377 1 +1503us[+9103us] +/- 55ms
)");
std::string line;
while (getline(iss, line)) {
auto f = line.cbegin(), l = line.cend();
AST::Record record;
if (parse(f, l, parser<>{}, record))
std::cout << "parsed: " << boost::fusion::as_vector(record) << "\n";
else
std::cout << "parse error\n";
if (f!=l)
std::cout << "remaining unparsed input: '" << std::string(f,l) << "'\n";
}
}
<start>
<try>^+ line-17532.dyn.kp</try>
<record_>
<try>^+ line-17532.dyn.kp</try>
<prefix_>
<try>^+ line-17532.dyn.kp</try>
<success> line-17532.dyn.kpon</success>
<attributes>[[^, +]]</attributes>
</prefix_>
<fqdn_>
<try>line-17532.dyn.kpone</try>
<success> 2 7 377 </success>
<attributes>[[l, i, n, e, -, 1, 7, 5, 3, 2, ., d, y, n, ., k, p, o, n, e, t, ., f, i]]</attributes>
</fqdn_>
<sample_>
<try> +1503us[+9103us] </try>
<duration_>
<try>1503us[+9103us] +/- </try>
<success>[+9103us] +/- 55ms</success>
<attributes>[0.001503s]</attributes>
</duration_>
<success>[+9103us] +/- 55ms</success>
<attributes>[[+, 0.001503s]]</attributes>
</sample_>
<sample_>
<try>+9103us] +/- 55ms</try>
<duration_>
<try>9103us] +/- 55ms</try>
<success>] +/- 55ms</success>
<attributes>[0.009103s]</attributes>
</duration_>
<success>] +/- 55ms</success>
<attributes>[[+, 0.009103s]]</attributes>
</sample_>
<tolerance_>
<try> +/- 55ms</try>
<duration_>
<try> 55ms</try>
<success></success>
<attributes>[0.055s]</attributes>
</duration_>
<success></success>
<attributes>[0.055s]</attributes>
</tolerance_>
<success></success>
<attributes>[[[^, +], [l, i, n, e, -, 1, 7, 5, 3, 2, ., d, y, n, ., k, p, o, n, e, t, ., f, i], 2, 7, 377, 1, [+, 0.001503s], [+, 0.009103s], 0.055s]]</attributes>
</record_>
<success></success>
<attributes>[[[^, +], [l, i, n, e, -, 1, 7, 5, 3, 2, ., d, y, n, ., k, p, o, n, e, t, ., f, i], 2, 7, 377, 1, [+, 0.001503s], [+, 0.009103s], 0.055s]]</attributes>
</start>
Upvotes: 5