Julia Fischer
Julia Fischer

Reputation: 21

boost tokenizer but keeping delimiter

maybe it is easy , but I could not find the answer myself.

I want to use boost::tokenizer but keep the delimiters with the string

My string is a bunch of numbers like these

"1.00299 344.2221-25.112-33112"

the result should be :

"1.00299"  "344.2221"  "-25.112" "-33112"

I know it looks a bit odd , but the files are written like that.

Another question is a bit complex since some strings come like this:

"1.00299E+45 344.22E-21-25.112E+11-3.31E-12" 

which should be:

"1.00299E+45" "344.22E-21" "-25.112E+11"  "-3.31E-12"`

Any Help would be greatly appreciated

Regards

Julia

Upvotes: 1

Views: 570

Answers (4)

sehe
sehe

Reputation: 393064

Someone brought to my attention that you might not actually want the quotes there.

If you just wanted to parse the numbers with full fidelity¹:

Live On Coliru

#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/qi_match.hpp>
#include <iostream>
namespace qi = boost::spirit::qi;

int main() {
    std::vector<double> values;
    std::cin >> std::noskipws >> qi::phrase_match(*('"'>*qi::double_>'"'), qi::space, values);

    for (auto d : values)
        std::cout << d << "\n";
}

Which prints:

1.00299
344.222
-25.112
-33112
1.00299e+45
3.4422e-19
-2.5112e+12
-3.31e-12

¹ you could use long double or qi::real_parser<T> with your choice of arbitrary-precision/decimal number type; see e.g. Boost::Lexical_cast conversion to float changes data

Upvotes: 2

sehe
sehe

Reputation: 393064

Let's implement a requote manipulator that allows you to do:

#include "requote.hpp"
#include <iostream>

int main() {
    std::cout << requote(std::cin);
}

Now what's in requote.hpp?

#include <istream>

struct requote {
    requote(std::istream& is) : _is(is.rdbuf()) {}

    friend std::ostream& operator<<(std::ostream& os, requote const& manip) {
        return manip.call(os);
    }

  private:
    std::ostream& call(std::ostream& os) const;
    mutable std::istream _is;
};

Note: We instantiate a private istream using the same streambuf, so the stream state is isolated.

All the magic is in call(). Here's how I'd do this using Boost Spirit. The complexity with copy_out is to ensure both that

  • we do not alter any part of the input presentation (precision, formatting) except the quoting
  • it is as efficient as possible (we don't construct any temporary strings, exception for parsing exceptions)
#include "requote.hpp"

namespace /*anon*/ {
    struct copy_out {
        mutable std::ostreambuf_iterator<char> out;

        //template <typename...> struct result { typedef void type; };
        template <typename R> void operator()(R const& r) const {
            *out++ = '"';
            out = std::copy(r.begin(), r.end(), out);
            *out++ = '"';
            *out++ = ' ';
        }
    };
}

#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>

std::ostream& requote::call(std::ostream& os) const {
    boost::phoenix::function<copy_out> copy_out_({os});
    using namespace boost::spirit::qi;

    boost::spirit::istream_iterator f(_is >> std::noskipws), l;
    bool ok = phrase_parse(f,l,
            *('"' > *raw[long_double][copy_out_(_1)] > '"') [boost::phoenix::ref(os)<<'\n'],
            space
        );

    if (ok && f==l)
        return os;

    throw std::runtime_error("parse error at '" + std::string(f,l) + "'");
}

DEMO

Self-Contained On Coliru

#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>

struct requote {
    requote(std::istream& is) : _is(is.rdbuf()) {}

    friend std::ostream& operator<<(std::ostream& os, requote const& manip) {
        return manip.call(os);
    }

  private:
    std::ostream& call(std::ostream& os) const {
        boost::phoenix::function<copy_out> copy_out_({os});
        using namespace boost::spirit::qi;

        boost::spirit::istream_iterator f(_is >> std::noskipws), l;
        bool ok = phrase_parse(f,l,
                *('"' > *raw[long_double][copy_out_(_1)] > '"') [boost::phoenix::ref(os)<<'\n'],
                space
            );

        if (ok && f==l)
            return os;

        throw std::runtime_error("parse error at '" + std::string(f,l) + "'");
    }

    struct copy_out {
        mutable std::ostreambuf_iterator<char> out;

        //template <typename...> struct result { typedef void type; };
        template <typename R> void operator()(R const& r) const {
            *out++ = '"';
            out = std::copy(r.begin(), r.end(), out);
            *out++ = '"';
            *out++ = ' ';
        }
    };
    mutable std::istream _is;
};

#include <iostream>
int main() {
    std::cout << requote(std::cin);
}

Output for the sample from the question:

"1.00299" "344.2221" "-25.112" "-33112" 
"1.00299E+45" "344.22E-21" "-25.112E+11" "-3.31E-12" 

Upvotes: 2

sbabbi
sbabbi

Reputation: 11181

Assuming that you need to read those values as doubles, you can do that in 3 lines with std::strtod:

#include <cstdlib>
#include <vector>

std::vector<double> parse(const char * p)
{
    std::vector<double> d;

    while ( *p )
        d.push_back( std::strtod(p, const_cast<char**>(&p)) );

    return d;
}

Or also with standard streams:

std::vector<double> parse(std::istream & in)
{
    /** Assuming default flags for *in* */
    std::vector<double> d;

    for (double v; in >> v; )
        d.push_back(v);
    return d;
}

Upvotes: 0

Maria
Maria

Reputation: 575

In general, tokenizers discard the tokens, and the problem that you're specifying looks a little more complicated than what a tokenizer can handle. You want to sometimes split on + or -, but only if it doesn't come after an 'E'. That's not logic that you can really easily explain to an all-purpose tokenizer.

You should probably consider writing a method to parse the string yourself. You could even still have the tokenizer split on ' ' and then parse the substrings to handle the other cases.

If you have control of the input data, it would be better to force whitespace between values and split on ' '.

Upvotes: 0

Related Questions