Tanner Summers
Tanner Summers

Reputation: 663

Which of these methods are possible/more efficient

I have a text file in the format such as this

ignore contents for about 8 lines
... 
       x        y         z
 - [7.6515, -10.8271, -28.5806, 123.8]
 - [7.6515, -10.8271, -28.5806, 125.0]
 - [7.6515, -10.8271, -28.5806, 125.9]
 - [7.6515, -10.8271, -28.5806, 126.8]
 - [7.6515, -10.8271, -28.5806, 127.9]
 - [7.6515, -10.8271, -28.5806, 128.9]
 - [7.6515, -10.8271, -28.5806, 130.0]
 - [7.6515, -10.8271, -28.5806, 130.9]
 - [7.6515, -10.8271, -28.5806, 131.8]

Is there a way to get the x,y points from the possible 35000+ lines that look like the ones above all at once for every line? If so, is this the best way to do it?

Or,

is it better to do use getline method on each line, then parse the line using boost::regex?

I need to get the x,y points and fill them into a float array.

I been using boost::regex for my needs, but it involves me taking each line at at time. I have no idea how efficient it is, so I was wondering if there is a better solution. If not, I can just continue what I been doing.

The solution has to be done in c++.

Upvotes: 2

Views: 118

Answers (2)

sehe
sehe

Reputation: 393613

Here's a take using Boost Spirit X3 and a mapped file.

struct Point { double x, y, z; };

template <typename Container>
bool parse(std::string const& fname, Container& into) {
    boost::iostreams::mapped_file mm(fname);

    using namespace boost::spirit::x3;

    return phrase_parse(mm.begin(), mm.end(),
            seek[ eps >> 'x' >> 'y' >> 'z' >> eol ] >> // skip contents for about 8 lines
            ('-' >> ('[' >> double_ >> ',' >> double_ >> ',' >> double_ >> omit[',' >> double_] >> ']')) % eol, // parse points
            blank, into);
}

Spirit is a parser generator, so it generates the parsing code for you based on the expressions (e.g. 'x' >> 'y' >> 'z' >> eol to match the header line).

Unlike regular expressions, Spirit knows how to deal with and transform the values, so you can then use with e.g. vector<Point>:

int main()
{
    std::vector<Point> v;

    if (parse("input.txt", v)) {
        std::cout << "Parsed " << v.size() << " elements\n";
        for (Point& p : v) {
            std::cout << "{" << p.x << ';' << p.y << ';' << p.z << "}\n";
        }
    } else {
        std::cout << "Parse failed\n";
    } 
}

Full Demo

Here the program parses itself with the sample data from your question embedded:

Live On Coliru

#include <iostream>
#include <boost/spirit/home/x3.hpp>
#include <boost/fusion/adapted/struct.hpp>
#include <boost/iostreams/device/mapped_file.hpp>

struct Point { double x, y, z; };

BOOST_FUSION_ADAPT_STRUCT(Point,x,y,z)

template <typename Container>
bool parse(std::string const& fname, Container& into) {
    boost::iostreams::mapped_file mm(fname);

    using namespace boost::spirit::x3;

    return phrase_parse(mm.begin(), mm.end(),
            seek[ eps >> 'x' >> 'y' >> 'z' >> eol ] >> // skip contents for about 8 lines
            ('-' >> ('[' >> double_ >> ',' >> double_ >> ',' >> double_ >> omit[',' >> double_] >> ']')) % eol, // parse points
            blank, into);
}

int main()
{
    std::vector<Point> v;

    if (parse("main.cpp", v)) {
        std::cout << "Parsed " << v.size() << " elements\n";
        for (Point& p : v) {
            std::cout << "{" << p.x << ';' << p.y << ';' << p.z << "}\n";
        }
    } else {
        std::cout << "Parse failed\n";
    } 
}

#if DATA
ignore contents for about 8 lines
... 
       x        y         z
 - [7.6515, -10.8271, -28.5806, 123.8]
 - [7.6515, -10.8271, -28.5806, 125.0]
 - [7.6515, -10.8271, -28.5806, 125.9]
 - [7.6515, -10.8271, -28.5806, 126.8]
 - [7.6515, -10.8271, -28.5806, 127.9]
 - [7.6515, -10.8271, -28.5806, 128.9]
 - [7.6515, -10.8271, -28.5806, 130.0]
 - [7.6515, -10.8271, -28.5806, 130.9]
 - [7.6515, -10.8271, -28.5806, 131.8]
#endif

Prints

Parsed 9 elements
{7.6515;-10.8271;-28.5806}
{7.6515;-10.8271;-28.5806}
{7.6515;-10.8271;-28.5806}
{7.6515;-10.8271;-28.5806}
{7.6515;-10.8271;-28.5806}
{7.6515;-10.8271;-28.5806}
{7.6515;-10.8271;-28.5806}
{7.6515;-10.8271;-28.5806}
{7.6515;-10.8271;-28.5806}

Upvotes: 3

user4832129
user4832129

Reputation:

No one answered yet so I give it a try. You didn't post your solution with regexes so I can't compare the performance. I speculate that my code may be a little faster.

struct Point
{
    float x;
    float y;
};

void transform_string( std::string& str )
{
    auto i { std::find( std::begin( str ), std::end( str ), '[' ) };
    std::remove( std::begin( str ), i, '-' );
    std::remove_if(
        std::begin( str ),
        std::end( str ),
        [] ( char c )
        {
            return c == ',' || c == '[' || c == ']';
        } );
}

std::istream& get_point( std::istream& in, Point& p )
{
    std::string str;
    std::getline( in, str );
    if ( !str.empty() )
    {
        transform_string( str );
        std::istringstream iss { str };
        iss >> p.x >> p.y;
    }
    return in;
}

The code is self-explanatory (I hope). It reads a line into string, removes hindering characters and uses std::istringstream to parse floats. It depends only on standard library, easy to read and its performance more than enough for one time operation ( it took ~300ms to process a file with 50k lines on my laptop ). It makes some assumptions about input and doesn't do validation. You use get_point similar way to operator >>. Hope this helps.

UPD: Test program:

int main()
{
    std::fstream in_file { "data.txt" };
    std::vector< Point > points;
    // Some code to prepare stream, e.g. skip first 8 lines with
    // std::string tmp; for ( int i = 0; i < 8; ++i ) std::getline( in_file, tmp );
    Point p;
    while ( get_point( in_file, p ) )
        points.emplace_back( p );

    for ( auto& point : points )
        std::cout << point.x << ' ' << point.y << std::endl;
}

Assumption that I made: Input stream contains only data with the structure that was shown in the question. If there are, for example, other characters, empty lines, or other content, then it won't work. If this assumption is not true, please specify this in the question.

Upvotes: 2

Related Questions