Dabiel Kabuto
Dabiel Kabuto

Reputation: 2862

Consecutive separators are ignored by BOOST / tokenizer

I am using BOOST / tokenizer to split a string. It works fine for strings like "1,2,3", but when there are two or more consecutive separators, for example "1,,3,4", it returns "1", "3", "4".

Is there a way to tokenizer returns an empty string "" instead of skip it?

Upvotes: 4

Views: 1437

Answers (2)

Tanner Sansbury
Tanner Sansbury

Reputation: 51871

Boost.Tokenizer's char_separator class provides the option to output an empty token or to skip ahead with its empty_tokens parameter. It defaults to boost::drop_empty_tokens, matching the behavior of strtok(), but can be told to output empty tokens by providing boost::keep_empty_tokens.

For example, with the following program:

#include <iostream>
#include <string>
#include <boost/foreach.hpp>
#include <boost/tokenizer.hpp>

int main()
{
  std::string str = "1,,3,4";
  typedef boost::tokenizer<boost::char_separator<char> > tokenizer;
  boost::char_separator<char> sep(
      ",", // dropped delimiters
      "",  // keep delimiters
      boost::keep_empty_tokens); // empty token policy

  BOOST_FOREACH(std::string token, tokenizer(str, sep))
  {
    std::cout << "<" << token << "> ";
  }
  std::cout << std::endl;
}

The output is:

<1> <> <3> <4> 

Upvotes: 6

alexbuisson
alexbuisson

Reputation: 8469

I supposed that you have use the split function as below

string text = "1,,3,4";
list<string> tokenList;
split(tokenList, text, is_any_of(","));
BOOST_FOREACH(string t, tokenList)
{
  cout << t << "." << endl;
}

If you carefully at the split prototype here you will notice the default parameter at the end !

So now in your call use an explicit token_compress_off for the last param and it will be ok.

split(tokenList, text, is_any_of(","), token_compress_off);

Upvotes: 4

Related Questions