Reputation: 663
I have a text file in the format such as this
ignore contents for about 8 lines
...
x y z
- [7.6515, -10.8271, -28.5806, 123.8]
- [7.6515, -10.8271, -28.5806, 125.0]
- [7.6515, -10.8271, -28.5806, 125.9]
- [7.6515, -10.8271, -28.5806, 126.8]
- [7.6515, -10.8271, -28.5806, 127.9]
- [7.6515, -10.8271, -28.5806, 128.9]
- [7.6515, -10.8271, -28.5806, 130.0]
- [7.6515, -10.8271, -28.5806, 130.9]
- [7.6515, -10.8271, -28.5806, 131.8]
Is there a way to get the x,y points from the possible 35000+ lines that look like the ones above all at once for every line? If so, is this the best way to do it?
Or,
is it better to do use getline
method on each line, then parse the line using boost::regex?
I need to get the x,y points and fill them into a float array.
I been using boost::regex for my needs, but it involves me taking each line at at time. I have no idea how efficient it is, so I was wondering if there is a better solution. If not, I can just continue what I been doing.
The solution has to be done in c++.
Upvotes: 2
Views: 118
Reputation: 393613
Here's a take using Boost Spirit X3 and a mapped file.
struct Point { double x, y, z; };
template <typename Container>
bool parse(std::string const& fname, Container& into) {
boost::iostreams::mapped_file mm(fname);
using namespace boost::spirit::x3;
return phrase_parse(mm.begin(), mm.end(),
seek[ eps >> 'x' >> 'y' >> 'z' >> eol ] >> // skip contents for about 8 lines
('-' >> ('[' >> double_ >> ',' >> double_ >> ',' >> double_ >> omit[',' >> double_] >> ']')) % eol, // parse points
blank, into);
}
Spirit is a parser generator, so it generates the parsing code for you based on the expressions (e.g. 'x' >> 'y' >> 'z' >> eol
to match the header line).
Unlike regular expressions, Spirit knows how to deal with and transform the values, so you can then use with e.g. vector<Point>
:
int main()
{
std::vector<Point> v;
if (parse("input.txt", v)) {
std::cout << "Parsed " << v.size() << " elements\n";
for (Point& p : v) {
std::cout << "{" << p.x << ';' << p.y << ';' << p.z << "}\n";
}
} else {
std::cout << "Parse failed\n";
}
}
Here the program parses itself with the sample data from your question embedded:
#include <iostream>
#include <boost/spirit/home/x3.hpp>
#include <boost/fusion/adapted/struct.hpp>
#include <boost/iostreams/device/mapped_file.hpp>
struct Point { double x, y, z; };
BOOST_FUSION_ADAPT_STRUCT(Point,x,y,z)
template <typename Container>
bool parse(std::string const& fname, Container& into) {
boost::iostreams::mapped_file mm(fname);
using namespace boost::spirit::x3;
return phrase_parse(mm.begin(), mm.end(),
seek[ eps >> 'x' >> 'y' >> 'z' >> eol ] >> // skip contents for about 8 lines
('-' >> ('[' >> double_ >> ',' >> double_ >> ',' >> double_ >> omit[',' >> double_] >> ']')) % eol, // parse points
blank, into);
}
int main()
{
std::vector<Point> v;
if (parse("main.cpp", v)) {
std::cout << "Parsed " << v.size() << " elements\n";
for (Point& p : v) {
std::cout << "{" << p.x << ';' << p.y << ';' << p.z << "}\n";
}
} else {
std::cout << "Parse failed\n";
}
}
#if DATA
ignore contents for about 8 lines
...
x y z
- [7.6515, -10.8271, -28.5806, 123.8]
- [7.6515, -10.8271, -28.5806, 125.0]
- [7.6515, -10.8271, -28.5806, 125.9]
- [7.6515, -10.8271, -28.5806, 126.8]
- [7.6515, -10.8271, -28.5806, 127.9]
- [7.6515, -10.8271, -28.5806, 128.9]
- [7.6515, -10.8271, -28.5806, 130.0]
- [7.6515, -10.8271, -28.5806, 130.9]
- [7.6515, -10.8271, -28.5806, 131.8]
#endif
Prints
Parsed 9 elements
{7.6515;-10.8271;-28.5806}
{7.6515;-10.8271;-28.5806}
{7.6515;-10.8271;-28.5806}
{7.6515;-10.8271;-28.5806}
{7.6515;-10.8271;-28.5806}
{7.6515;-10.8271;-28.5806}
{7.6515;-10.8271;-28.5806}
{7.6515;-10.8271;-28.5806}
{7.6515;-10.8271;-28.5806}
Upvotes: 3
Reputation:
No one answered yet so I give it a try. You didn't post your solution with regexes so I can't compare the performance. I speculate that my code may be a little faster.
struct Point
{
float x;
float y;
};
void transform_string( std::string& str )
{
auto i { std::find( std::begin( str ), std::end( str ), '[' ) };
std::remove( std::begin( str ), i, '-' );
std::remove_if(
std::begin( str ),
std::end( str ),
[] ( char c )
{
return c == ',' || c == '[' || c == ']';
} );
}
std::istream& get_point( std::istream& in, Point& p )
{
std::string str;
std::getline( in, str );
if ( !str.empty() )
{
transform_string( str );
std::istringstream iss { str };
iss >> p.x >> p.y;
}
return in;
}
The code is self-explanatory (I hope). It reads a line into string, removes hindering characters and uses std::istringstream
to parse floats. It depends only on standard library, easy to read and its performance more than enough for one time operation ( it took ~300ms to process a file with 50k lines on my laptop ). It makes some assumptions about input and doesn't do validation. You use get_point
similar way to operator >>
. Hope this helps.
UPD: Test program:
int main()
{
std::fstream in_file { "data.txt" };
std::vector< Point > points;
// Some code to prepare stream, e.g. skip first 8 lines with
// std::string tmp; for ( int i = 0; i < 8; ++i ) std::getline( in_file, tmp );
Point p;
while ( get_point( in_file, p ) )
points.emplace_back( p );
for ( auto& point : points )
std::cout << point.x << ' ' << point.y << std::endl;
}
Assumption that I made: Input stream contains only data with the structure that was shown in the question. If there are, for example, other characters, empty lines, or other content, then it won't work. If this assumption is not true, please specify this in the question.
Upvotes: 2