nookonee
nookonee

Reputation: 921

Spliting a string with multiple delimiter and save it into a vector

I know there's many topics with some problems like mine but I can't find the right answer for my problem in particular.

I would like to split my string into tokens by multiples delimiter (' ', '\n', '(', ')') and save all in my vector (Even the delimiters).

Here's the first code I made, it actually just take all lines, but now I would like to split it with the other delimiters.

std::vector<std::string> Lexer::getToken(std::string flow)
{
    std::string token;
    std::vector<std::string> tokens;
    std::stringstream f;

    f << flow;
    while (std::getline(f, token, '\n'))
    {
        tokens.push_back(token);
    }
    return (tokens);
}

Exmaple, if I have :

push int32(42)

I would like to have the folowing tokens :

push

int32

(

42

)

Upvotes: 1

Views: 1356

Answers (2)

Tony Delroy
Tony Delroy

Reputation: 106254

You can do this using per-character logic if you think through the states involved....

std::vector<std::string> tokens;
std::string delims = " \n()";
char c;
bool last_was_delim = true;
while (f.get(c))
    if (delims.find(c) != tokens.end())
    {
        tokens.emplace_back(1, c);
        last_was_delim = true;
    }
    else
    {
        if (last_was_delim)
             tokens.emplace_back(1, c); // start new string
        else
             tokens.back() += c; // append to existing string
        last_was_delim = false;
    }

Obviously this considers say "((" or " " (two spaces) to be repeated distinct delimiters, to be entered into tokens separately. Tune to taste if necessary.

Equivalently, but using flow control instead of a bool / a different while (f.get(c)) loop handles additional characters for an in-progress token:

std::vector<std::string> tokens;
std::string delims = " \n()";
char c;
while (f.get(c))
    if (delims.find(c) != tokens.end())
        tokens.emplace_back(1, c);
    else
    {
        tokens.emplace_back(1, c); // start new string
        while (f.get(c))
            if (delims.find(c) != tokens.end())
            {
                tokens.emplace_back(1, c);
                break;
            }
            else
                tokens.back() += c; // append to existing string
    }

Or, if you like goto statements:

std::vector<std::string> tokens;
std::string delims = " \n()";
char c;
while (f.get(c))
    if (delims.find(c) != tokens.end())
      add_token:
        tokens.emplace_back(1, c);
    else
    {
        tokens.emplace_back(1, c); // start new string
        while (f.get(c))
            if (delims.find(c) != tokens.end())
                goto add_token;
            else
                tokens.back() += c; // append to existing string
    }

Which is "easier" to grok is debatable....

Upvotes: 2

Wintermute
Wintermute

Reputation: 44073

I'd use a regular expression for this:

#include <regex>

std::vector<std::string> getToken(std::string const &flow) {
  // Delimiter regex. Depending on your desired behavior, you may want to
  // remove the + from it; with the +, it will combine adjacent delimiters
  // into one. That is to say, "foo (\n) bar" will be tokenized into "foo",
  // "bar" instead of "foo", "", "", "", "", "bar".
  std::regex re("[ \n()]+");

  // range-construct result vector from regex_token_iterators
  return std::vector<std::string>(
      std::sregex_token_iterator(flow.begin(), flow.end(), re, -1),
      std::sregex_token_iterator()
    );
}

Upvotes: 3

Related Questions