ph4nt0m
ph4nt0m

Reputation: 968

C++ alternative for parsing input with sscanf

Assuming my program expects arguments of the form [ 0.562 , 1.4e-2 ] (i.e. pairs of floats), how should I parse this input in C++ without regular expressions? I know there are many corner cases to consider when it comes to user input, but let's assume the given input closely matches the above format (apart from further whitespace).

In C, I could do something like sscanf(string, "[%g , %g]", &f1, &f2); to extract the two floating point values, which is very compact.

In C++, this is what I've come up with so far:

std::string s = "[ 0.562 , 1.4e-2 ]"; // example input

float f1 = 0.0f, f2 = 0.0f;

size_t leftBound = s.find('[', 0) + 1;
size_t count = s.find(']', leftBound) - leftBound;

std::istringstream ss(s.substr(leftBound, count));
string garbage;

ss >> f1 >> garbage >> f2;

if(!ss)
  std::cout << "Error while parsing" << std::endl;

How could I improve this code? In particular, I'm concerned with the garbage string, but I don't know how else to skip the , between the two values.

Upvotes: 14

Views: 21024

Answers (4)

Mike Reznikov
Mike Reznikov

Reputation: 33

Try using this simple and powerful function I wrote a draft for:

#include <string_view>
#include <spanstream>

int sscan(std::string_view data, const std::string_view format)
{
    return 0;
}

template <typename T, typename... Args>
int sscan(std::string_view data, const std::string_view format, T& var, Args&... args)
{
    // find needle before {} and search for it in data
    size_t format_first;
    if ((format_first = format.find("{}")) == (size_t)-1) {
        return 0;
    }
    size_t data_first;
    if ((data_first = data.find(&format.front(), 0, format_first)) == (size_t)-1) {
        return 0;
    }
    // try to find next {}, take a next needle between {} and {} and search for it in data
    size_t search_next = (size_t)-1;
    size_t found_next = (size_t)-1;
    size_t found_size;
    if (format.size() != format_first + 2 && (search_next = format.find("{}", format_first + 2)) != (size_t)-1) {
        found_size = search_next - format_first - 2;
        found_next = data.find(&format.front() + format_first + 2, data_first + 1, found_size);
    }
    // if we found both needles - take data between them, if just a one - take whole data after first
    std::string_view part(&data.front() + data_first + format_first, found_next == (size_t)-1 ? (data.size() - data_first - format_first) : (found_next - data_first));
    std::ispanstream ss(part);
    if (!(ss >> var)) {
        return 0;
    }
    if (found_next != (size_t)-1) {
        return sscan(std::string_view(&data.front() + found_next + found_size, data.size() - found_next - found_size),
                            std::string_view(&format.front() + search_next, format.size() - search_next),
                            args...) + 1;
    }
    else {
        return 1;  // all data was eaten by current {}
    }
}

Example:

std::string name;
int x;
int y;
if (sscan(key, "{}_X{}Y{}", name, x, y) == 3)

for key like "NAME_THIS_X5Y9"

Upvotes: 0

sehe
sehe

Reputation: 392931

I you can afford to use boost, you could use Spirit.

See

  • From a string Live On Coliru (in c++03):

  • Update And here's the approach if you were actually trying to read from a stream (it's actually somewhat simpler, and integrates really well with your other stream reading activities):
    Live On Coliru too (c++03)

Allthough this seems more verbose, Spirit is also a lot more powerful and type-safe than sscanf. And it operates on streams.

Also note that inf, -inf, nan will be handled as expected.

Live On Coliru

#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/qi_match.hpp>
#include <sstream>

namespace qi = boost::spirit::qi;

int main()
{
    std::istringstream ss("[ 0.562 , 1.4e-2 ]"); // example input
    ss.unsetf(std::ios::skipws); // we might **want** to handle whitespace in our grammar, not needed now

    float f1 = 0.0f, f2 = 0.0f;

    if (ss >> qi::phrase_match('[' >> qi::double_ >> ',' >> qi::double_ >> ']', qi::space, f1, f2))
    {
        std::cout << "Parsed: " << f1 << " and " << f2 << "\n"; // default formatting...
    } else
    {
        std::cout << "Error while parsing" << std::endl;
    }
}

Upvotes: 7

Dietmar K&#252;hl
Dietmar K&#252;hl

Reputation: 153820

The obvious approach is to create a simple manipulator and use that. For example, a manipulator using a statically provided char to determine if the next non-whitespace character is that character and, if so, extracts it could look like this:

#include <iostream>
#include <sstream>

template <char C>
std::istream& expect(std::istream& in)
{
    if ((in >> std::ws).peek() == C) {
        in.ignore();
    }
    else {
        in.setstate(std::ios_base::failbit);
    }
    return in;
}

You can then use the thus build manipulator to extract characters:

int main(int ac, char *av[])
{
    std::string s(ac == 1? "[ 0.562 , 1.4e-2 ]": av[1]);
    float f1 = 0.0f, f2 = 0.0f;

    std::istringstream in(s);
    if (in >> expect<'['> >> f1 >> expect<','> >> f2 >> expect<']'>) {
        std::cout << "read f1=" << f1 << " f2=" << f2 << '\n';
    }
    else {
        std::cout << "ERROR: failed to read '" << s << "'\n";
    }
}

Upvotes: 8

David G
David G

Reputation: 96810

Other than regular expressions, there's probably something in Boost you can use. But if you can't use Boost then you can define a std::ctype<char> facet that effectively ignores all unnecessary characters by classifying them as whitespace. You can install this facet into a locale and imbue it into ss.

Upvotes: 2

Related Questions