Nick
Nick

Reputation: 10499

Parse a C-string of floating numbers

I have a C-string which contains a list of floating numbers separated by commas and spaces. Each pair of numbers is separated by one (or more) spaces and represents a point where the x and y fields are separated by a comma (and optionally by spaces).

" 10,9 2.5, 3   4 ,150.32 "

I need to parse this string in order to fill a list of Point(x, y).
Following is my current implementation:

const char* strPoints = getString();
std::istringstream sstream(strPoints);

float x, y;
char comma;

while (sstream >> x >> comma >> y)
{
   myList.push(Point(x, y));
}

Since I need to parse a lot (up to 500,000) of these strings I'm wondering if there is a faster solution.

Upvotes: 5

Views: 1592

Answers (3)

sehe
sehe

Reputation: 393457

Look at Boost Spirit:

It supports NaN, positive and negative infinity just fine. Also it allows you to express the constraining grammar succinctly.

  1. Simple adaptation of the code

    Here is the adapted sample for your grammar:

    struct Point { float x,y; };
    typedef std::vector<Point> data_t;
    
    // And later:
    bool ok = phrase_parse(f,l,*(double_ > ',' > double_), space, data);
    

    The iterators can be any iterators. So you can hook it up with your C string just fine.

    Here's a straight adaptation of the linked benchmark case. This shows you how to parse from any std::istream or directly from a memory mapped file.

    Live On Coliru


  2. Further optimizations (strictly for C strings)

    Here's a version that doesn't need to know the length of the string up front (this is neat because it avoids the strlen call in case you didn't have the length available):

    template <typename OI>
    static inline void parse_points(OI out, char const* it, char const* last = std::numeric_limits<char const*>::max()) {
        namespace qi  = boost::spirit::qi;
        namespace phx = boost::phoenix;
    
        bool ok = qi::phrase_parse(it, last,
                *(qi::double_ >> ',' >> qi::double_) [ *phx::ref(out) = phx::construct<Point>(qi::_1, qi::_2) ],
                qi::space);
    
        if (!ok || !(it == last || *it == '\0')) {
            throw it; // TODO proper error reporting?
        }
    }
    

    Note how I made it take an output iterator so that you get to decide how to accumulate the results. The obvious wrapper to /just/ parse to a vector would be:

    static inline data_t parse_points(char const* szInput) {
        data_t pts;
        parse_points(back_inserter(pts), szInput);
        return pts;
    }
    

    But you can also do different things (like append to an existing container, that could have reserved a known capacity up front etc.). Things like this often allow truly optimized integration in the end.

    Here's that code fully demo-ed in ~30 lines of essential code:

    Live On Coliru

  3. Extra Awesome Bonus

    To show off the flexibility of this parser; if you just wanted to check the input and get a count of the points, you can replace the output iterator with a simple lambda function that increments a counter instead of adds a newly constructed point.

    int main() {
        int count = 0;
        parse_points( " 10,9 2.5, 3   4 ,150.32    ", boost::make_function_output_iterator([&](Point const&){count++;}));
        std::cout << "elements in sample: " << count << "\n";
    }
    

    Live On Coliru

    Since everything is inlined the compiler will notice that the whole Point doesn't need to be constructed here and eliminate that code: http://paste.ubuntu.com/9781055/

    The main function is seen directly invoking the very parser primitives. Handcoding the parser won't get you better tuning here, at least not without a lot of effort.

Upvotes: 5

Ryan Burn
Ryan Burn

Reputation: 2316

I got much better performance parsing out the points using a combination of std::find and std::strtof and the code wasn't much more complicated. Here's the test I ran:

#include <iostream>                                                                             
#include <sstream>                                                                              
#include <random>                                                                               
#include <chrono>                                                                               
#include <cctype>                                                                               
#include <algorithm>                                                                            
#include <cstdlib>                                                                              
#include <forward_list>                                                                         

struct Point { float x; float y; };                                                             
using PointList = std::forward_list<Point>;                                                     
using Clock = std::chrono::steady_clock;                                                        
using std::chrono::milliseconds;                                                                

std::string generate_points(int n) {                                                            
  static auto random_generator = std::mt19937{std::random_device{}()};                          
  std::ostringstream oss;                                                                       
  std::uniform_real_distribution<float> distribution(-1, 1);                                    
  for (int i=0; i<n; ++i) {                                                                     
    oss << distribution(random_generator) << " ," << distribution(random_generator) << "\t \n"; 
  }                                                                                             
  return oss.str();                                                                             
}                                                                                               

PointList parse_points1(const char* s) {                                                        
  std::istringstream iss(s);                                                                    
  PointList points;                                                                             
  float x, y;                                                                                   
  char comma;                                                                                   
  while (iss >> x >> comma >> y)                                                                
       points.push_front(Point{x, y});                                                          
  return points;                                                                                
}                                                                                               

inline                                                                                          
std::tuple<Point, const char*> parse_point2(const char* x_first, const char* last) {            
  auto is_whitespace = [](char c) { return std::isspace(c); };                                  
  auto x_last  = std::find(x_first, last, ',');                                                 
  auto y_first = std::find_if_not(std::next(x_last), last, is_whitespace);                      
  auto y_last  = std::find_if(y_first, last, is_whitespace);                                    
  auto x = std::strtof(x_first, (char**)&x_last);                                               
  auto y = std::strtof(y_first, (char**)&y_last);                                               
  auto next_x_first = std::find_if_not(y_last, last, is_whitespace);                            
  return std::make_tuple(Point{x, y}, next_x_first);                                            
}                                                                                               

PointList parse_points2(const char* i, const char* last) {                                      
  PointList points;                                                                             
  Point point;                                                                                  
  while (i != last) {                                                                           
    std::tie(point, i) = parse_point2(i, last);                                                 
    points.push_front(point);                                                                   
  }                                                                                             
  return points;                                                                                
}                                                                                               

int main() {                                                                                    
  auto s = generate_points(500000);                                                             
  auto time0 = Clock::now();                                                                    
  auto points1 = parse_points1(s.c_str());                                                      
  auto time1 = Clock::now();                                                                    
  auto points2 = parse_points2(s.data(), s.data() + s.size());                                  
  auto time2 = Clock::now();                                                                    
  std::cout << "using stringstream: "                                                           
            << std::chrono::duration_cast<milliseconds>(time1 - time0).count() << '\n';         
  std::cout << "using strtof: "                                                                 
            << std::chrono::duration_cast<milliseconds>(time2 - time1).count() << '\n';         
  return 0;                                                                                     
}                                  

outputs:

using stringstream: 1262
using strtof: 120

Upvotes: 3

Philipp Cla&#223;en
Philipp Cla&#223;en

Reputation: 43999

You can first try to disable the sychronization with the C I/O:

std::ios::sync_with_stdio(false);

Source: Using scanf() in C++ programs is faster than using cin?

You can also try to use alternatives to iostream:

I think you should give the sync_with_stdio(false) a try. The other alternatives require more coding, and I'm not sure that you will win much (if any).

Upvotes: 0

Related Questions