Jason Per
Jason Per

Reputation: 149

boost tokenizer / char separator

I have tried with both commented and uncomented version of the code:

string separator1(""); //dont let quoted arguments escape themselves
string separator2(",\n"); //split on comma and newline
string separator3("\"\'"); //let it have quoted arguments

escaped_list_separator<char> els(separator1, separator2, separator4);
tokenizer<escaped_list_separator<char>> tok(str);//, els);


for (tokenizer<escaped_list_separator<char>>::iterator beg = tok.begin();beg!= tok.end(); ++beg) {
next = *beg;
boost::trim(next);
cout << counter << " " << next << endl;
counter++;
}

to separate a file which has the following format:

 12345, Test Test, Test
 98765, Test2 test2, Test2

This is the output

0 12345
1 Test Test
2 Test
98765
3 Test2 test2
4 Test2

I am not sure where the problem is but what I need to achieve is to have a number 3 before 98765

Upvotes: 3

Views: 1326

Answers (2)

sehe
sehe

Reputation: 392911

Looks to me you are parsing, not splitting.

Using a parser generator would be superior IMO

Live On Coliru

#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;

int main() {
    boost::spirit::istream_iterator f(std::cin >> std::noskipws), l;

    std::vector<std::string> columns;
    qi::parse(f, l, +~qi::char_(",\r\n") % (qi::eol | ','), columns);

    size_t n = 0;
    for(auto& tok : columns) { std::cout << n++ << "\t" << tok << "\n"; }
}

Prints

0   12345
1    Test Test
2    Test
3   98765
4    Test2 test2
5    Test2

Frankly I think it's superior because it will allow you write

phrase_parse(f, l, (qi::_int >> *(',' >> +~qi::char_("\r\n,")) % qi::eol, qi::blank...);

And get proper parsing of the data types, whitespace skipping etc. for "free"

Upvotes: 0

Rama
Rama

Reputation: 3305

You forgot the newline separator: string separator2(",\n");

#include <iostream>
#include <boost/tokenizer.hpp>
#include <boost/algorithm/string.hpp>

using namespace std;

   using namespace boost;

int main() {
    string str = "TEst,hola\nhola";
    string separator1(""); //dont let quoted arguments escape themselves
    string separator2(",\n"); //split on comma and newline
    string separator3("\""); //let it have quoted arguments

    escaped_list_separator<char> els(separator1, separator2, separator3);
    tokenizer<escaped_list_separator<char>> tok(str, els);

    int counter = 0, current_siding = 0, wagon_pos = 0, cur_vector_pos = 0;

    string next;

    for (tokenizer<escaped_list_separator<char>>::iterator beg = tok.begin();     beg != tok.end(); ++beg) {
        next = *beg;
        boost::trim(next);
        cout << counter << " " << next << endl;
        counter++;

    }
    return 0;
}  

Upvotes: 2

Related Questions