Which of these methods are possible/more efficient

Question

I have a text file in the format such as this

ignore contents for about 8 lines
... 
       x        y         z
 - [7.6515, -10.8271, -28.5806, 123.8]
 - [7.6515, -10.8271, -28.5806, 125.0]
 - [7.6515, -10.8271, -28.5806, 125.9]
 - [7.6515, -10.8271, -28.5806, 126.8]
 - [7.6515, -10.8271, -28.5806, 127.9]
 - [7.6515, -10.8271, -28.5806, 128.9]
 - [7.6515, -10.8271, -28.5806, 130.0]
 - [7.6515, -10.8271, -28.5806, 130.9]
 - [7.6515, -10.8271, -28.5806, 131.8]

Is there a way to get the x,y points from the possible 35000+ lines that look like the ones above all at once for every line? If so, is this the best way to do it?

Or,

is it better to do use getline method on each line, then parse the line using boost::regex?

I need to get the x,y points and fill them into a float array.

I been using boost::regex for my needs, but it involves me taking each line at at time. I have no idea how efficient it is, so I was wondering if there is a better solution. If not, I can just continue what I been doing.

The solution has to be done in c++.

user4832129 · Accepted Answer

No one answered yet so I give it a try. You didn't post your solution with regexes so I can't compare the performance. I speculate that my code may be a little faster.

struct Point
{
    float x;
    float y;
};

void transform_string( std::string& str )
{
    auto i { std::find( std::begin( str ), std::end( str ), '[' ) };
    std::remove( std::begin( str ), i, '-' );
    std::remove_if(
        std::begin( str ),
        std::end( str ),
        [] ( char c )
        {
            return c == ',' || c == '[' || c == ']';
        } );
}

std::istream& get_point( std::istream& in, Point& p )
{
    std::string str;
    std::getline( in, str );
    if ( !str.empty() )
    {
        transform_string( str );
        std::istringstream iss { str };
        iss >> p.x >> p.y;
    }
    return in;
}

The code is self-explanatory (I hope). It reads a line into string, removes hindering characters and uses std::istringstream to parse floats. It depends only on standard library, easy to read and its performance more than enough for one time operation ( it took ~300ms to process a file with 50k lines on my laptop ). It makes some assumptions about input and doesn't do validation. You use get_point similar way to operator >>. Hope this helps.

UPD: Test program:

int main()
{
    std::fstream in_file { "data.txt" };
    std::vector< Point > points;
    // Some code to prepare stream, e.g. skip first 8 lines with
    // std::string tmp; for ( int i = 0; i < 8; ++i ) std::getline( in_file, tmp );
    Point p;
    while ( get_point( in_file, p ) )
        points.emplace_back( p );

    for ( auto& point : points )
        std::cout << point.x << ' ' << point.y << std::endl;
}

Assumption that I made: Input stream contains only data with the structure that was shown in the question. If there are, for example, other characters, empty lines, or other content, then it won't work. If this assumption is not true, please specify this in the question.

Which of these methods are possible/more efficient

Answers (2)

Full Demo

Related Questions