betabandido
betabandido

Reputation: 19694

Using templates for implementing a generic string parser

I am trying to come up with a generic solution for parsing strings (with a given format). For instance, I would like to be able to parse a string containing a list of numeric values (integers or floats) and return a std::vector. This is what I have so far:

template<typename T, typename U>
T parse_value(const U& u) {
    throw std::runtime_error("no parser available");
}

template<typename T>
std::vector<T> parse_value(const std::string& s) {
    std::vector<std::string> parts;
    boost::split(parts, s, boost::is_any_of(","));
    std::vector<T> res;
    std::transform(parts.begin(), parts.end(), std::back_inserter(res),
            [](const std::string& s) { return boost::lexical_cast<T>(s); });
    return res;
}

Additionally, I would like to be able to parse strings containing other type of values. For instance:

struct Foo { /* ... */ };

template<>
Foo parse_value(const std::string& s) {
    /* parse string and return a Foo object */
}

The reason to maintain a single "hierarchy" of parse_value functions is because, sometimes, I want to parse an optional value (which may exist or not), using boost::optional. Ideally, I would like to have just a single parse_optional_value function that would delegate on the corresponding parse_value function:

template<typename T>
boost::optional<T> parse_optional_value(const boost::optional<std::string>& s) {
    if (!s) return boost::optional<T>();
    return boost::optional<T>(parse_value<T>(*s));
}

So far, my current solution does not work (the compiler cannot deduce the exact function to use). I guess the problem is that my solution relies on deducing the template value based on the return type of parse_value functions. I am not really sure how to fix this (or even whether it is possible to fix it, since the design approach could just be totally flawed). Does anyone know a way to solve what I am trying to do? I would really appreciate if you could just point me to a possible way to address the issues that I am having with my current implementation. BTW, I am definitely open to completely different ideas for solving this problem too.

Upvotes: 0

Views: 1297

Answers (2)

c9s
c9s

Reputation: 1917

Here is an example of libsass parser:

const char* interpolant(const char* src) {
  return recursive_scopes< exactly<hash_lbrace>, exactly<rbrace> >(src);
}

// Match a single character literal.
// Regex equivalent: /(?:x)/
template <char chr>
const char* exactly(const char* src) {
  return *src == chr ? src + 1 : 0;
}

where rules could be passed into the lex method.

Upvotes: 0

rici
rici

Reputation: 241771

You cannot overload functions based on return value [1]. This is precisely why the standard IO library uses the construct:

std::cin >> a >> b;

which may not be your piece of cake -- many people don't like it, and it is truly not without its problems -- but it does a nice job of providing a target type to the parser. It also has the advantage over a static parse<X>(const std::string&) prototype that it allows for chaining and streaming, as above. Sometimes that's not needed, but in many parsing contexts it is essential, and the use of operator>> is actually a pretty cool syntax. [2]

The standard library doesn't do what would be far and away the coolest thing, which is to skip string constants scanf style and allow interleaved reading.

vector<int> integers;
std::cin >> "[" >> interleave(integers, ",") >> "]";

However, that could be defined. (Possibly it would be better to use an explicit wrapper around the string literals, but actually I prefer it like that; but if you were passing a variable you'd want to use a wrapper).


[1] With the new auto declaration, the reason for this becomes even clearer.

[2] IO manipulators, on the other hand, are a cruel joke. And error handling is pathetic. But you can't have everything.

Upvotes: 1

Related Questions