Reputation: 21
maybe it is easy , but I could not find the answer myself.
I want to use boost::tokenizer but keep the delimiters with the string
My string is a bunch of numbers like these
"1.00299 344.2221-25.112-33112"
the result should be :
"1.00299" "344.2221" "-25.112" "-33112"
I know it looks a bit odd , but the files are written like that.
Another question is a bit complex since some strings come like this:
"1.00299E+45 344.22E-21-25.112E+11-3.31E-12"
which should be:
"1.00299E+45" "344.22E-21" "-25.112E+11" "-3.31E-12"`
Any Help would be greatly appreciated
Regards
Julia
Upvotes: 1
Views: 570
Reputation: 393064
Someone brought to my attention that you might not actually want the quotes there.
If you just wanted to parse the numbers with full fidelity¹:
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/qi_match.hpp>
#include <iostream>
namespace qi = boost::spirit::qi;
int main() {
std::vector<double> values;
std::cin >> std::noskipws >> qi::phrase_match(*('"'>*qi::double_>'"'), qi::space, values);
for (auto d : values)
std::cout << d << "\n";
}
Which prints:
1.00299
344.222
-25.112
-33112
1.00299e+45
3.4422e-19
-2.5112e+12
-3.31e-12
¹ you could use long double
or qi::real_parser<T>
with your choice of arbitrary-precision/decimal number type; see e.g. Boost::Lexical_cast conversion to float changes data
Upvotes: 2
Reputation: 393064
Let's implement a requote
manipulator that allows you to do:
#include "requote.hpp"
#include <iostream>
int main() {
std::cout << requote(std::cin);
}
Now what's in requote.hpp
?
#include <istream>
struct requote {
requote(std::istream& is) : _is(is.rdbuf()) {}
friend std::ostream& operator<<(std::ostream& os, requote const& manip) {
return manip.call(os);
}
private:
std::ostream& call(std::ostream& os) const;
mutable std::istream _is;
};
Note: We instantiate a private
istream
using the same streambuf, so the stream state is isolated.
All the magic is in call()
. Here's how I'd do this using Boost Spirit. The complexity with copy_out
is to ensure both that
#include "requote.hpp"
namespace /*anon*/ {
struct copy_out {
mutable std::ostreambuf_iterator<char> out;
//template <typename...> struct result { typedef void type; };
template <typename R> void operator()(R const& r) const {
*out++ = '"';
out = std::copy(r.begin(), r.end(), out);
*out++ = '"';
*out++ = ' ';
}
};
}
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
std::ostream& requote::call(std::ostream& os) const {
boost::phoenix::function<copy_out> copy_out_({os});
using namespace boost::spirit::qi;
boost::spirit::istream_iterator f(_is >> std::noskipws), l;
bool ok = phrase_parse(f,l,
*('"' > *raw[long_double][copy_out_(_1)] > '"') [boost::phoenix::ref(os)<<'\n'],
space
);
if (ok && f==l)
return os;
throw std::runtime_error("parse error at '" + std::string(f,l) + "'");
}
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
struct requote {
requote(std::istream& is) : _is(is.rdbuf()) {}
friend std::ostream& operator<<(std::ostream& os, requote const& manip) {
return manip.call(os);
}
private:
std::ostream& call(std::ostream& os) const {
boost::phoenix::function<copy_out> copy_out_({os});
using namespace boost::spirit::qi;
boost::spirit::istream_iterator f(_is >> std::noskipws), l;
bool ok = phrase_parse(f,l,
*('"' > *raw[long_double][copy_out_(_1)] > '"') [boost::phoenix::ref(os)<<'\n'],
space
);
if (ok && f==l)
return os;
throw std::runtime_error("parse error at '" + std::string(f,l) + "'");
}
struct copy_out {
mutable std::ostreambuf_iterator<char> out;
//template <typename...> struct result { typedef void type; };
template <typename R> void operator()(R const& r) const {
*out++ = '"';
out = std::copy(r.begin(), r.end(), out);
*out++ = '"';
*out++ = ' ';
}
};
mutable std::istream _is;
};
#include <iostream>
int main() {
std::cout << requote(std::cin);
}
Output for the sample from the question:
"1.00299" "344.2221" "-25.112" "-33112"
"1.00299E+45" "344.22E-21" "-25.112E+11" "-3.31E-12"
Upvotes: 2
Reputation: 11181
Assuming that you need to read those values as doubles, you can do that in 3 lines with std::strtod
:
#include <cstdlib>
#include <vector>
std::vector<double> parse(const char * p)
{
std::vector<double> d;
while ( *p )
d.push_back( std::strtod(p, const_cast<char**>(&p)) );
return d;
}
Or also with standard streams:
std::vector<double> parse(std::istream & in)
{
/** Assuming default flags for *in* */
std::vector<double> d;
for (double v; in >> v; )
d.push_back(v);
return d;
}
Upvotes: 0
Reputation: 575
In general, tokenizers discard the tokens, and the problem that you're specifying looks a little more complicated than what a tokenizer can handle. You want to sometimes split on + or -, but only if it doesn't come after an 'E'. That's not logic that you can really easily explain to an all-purpose tokenizer.
You should probably consider writing a method to parse the string yourself. You could even still have the tokenizer split on ' ' and then parse the substrings to handle the other cases.
If you have control of the input data, it would be better to force whitespace between values and split on ' '.
Upvotes: 0