Reputation: 921
I know there's many topics with some problems like mine but I can't find the right answer for my problem in particular.
I would like to split my string into tokens by multiples delimiter (' '
, '\n'
, '('
, ')'
) and save all in my vector (Even the delimiters).
Here's the first code I made, it actually just take all lines, but now I would like to split it with the other delimiters.
std::vector<std::string> Lexer::getToken(std::string flow)
{
std::string token;
std::vector<std::string> tokens;
std::stringstream f;
f << flow;
while (std::getline(f, token, '\n'))
{
tokens.push_back(token);
}
return (tokens);
}
Exmaple, if I have :
push int32(42)
I would like to have the folowing tokens :
push
int32
(
42
)
Upvotes: 1
Views: 1356
Reputation: 106254
You can do this using per-character logic if you think through the states involved....
std::vector<std::string> tokens;
std::string delims = " \n()";
char c;
bool last_was_delim = true;
while (f.get(c))
if (delims.find(c) != tokens.end())
{
tokens.emplace_back(1, c);
last_was_delim = true;
}
else
{
if (last_was_delim)
tokens.emplace_back(1, c); // start new string
else
tokens.back() += c; // append to existing string
last_was_delim = false;
}
Obviously this considers say "(("
or " "
(two spaces) to be repeated distinct delimiters, to be entered into tokens
separately. Tune to taste if necessary.
Equivalently, but using flow control instead of a bool
/ a different while (f.get(c))
loop handles additional characters for an in-progress token:
std::vector<std::string> tokens;
std::string delims = " \n()";
char c;
while (f.get(c))
if (delims.find(c) != tokens.end())
tokens.emplace_back(1, c);
else
{
tokens.emplace_back(1, c); // start new string
while (f.get(c))
if (delims.find(c) != tokens.end())
{
tokens.emplace_back(1, c);
break;
}
else
tokens.back() += c; // append to existing string
}
Or, if you like goto
statements:
std::vector<std::string> tokens;
std::string delims = " \n()";
char c;
while (f.get(c))
if (delims.find(c) != tokens.end())
add_token:
tokens.emplace_back(1, c);
else
{
tokens.emplace_back(1, c); // start new string
while (f.get(c))
if (delims.find(c) != tokens.end())
goto add_token;
else
tokens.back() += c; // append to existing string
}
Which is "easier" to grok is debatable....
Upvotes: 2
Reputation: 44073
I'd use a regular expression for this:
#include <regex>
std::vector<std::string> getToken(std::string const &flow) {
// Delimiter regex. Depending on your desired behavior, you may want to
// remove the + from it; with the +, it will combine adjacent delimiters
// into one. That is to say, "foo (\n) bar" will be tokenized into "foo",
// "bar" instead of "foo", "", "", "", "", "bar".
std::regex re("[ \n()]+");
// range-construct result vector from regex_token_iterators
return std::vector<std::string>(
std::sregex_token_iterator(flow.begin(), flow.end(), re, -1),
std::sregex_token_iterator()
);
}
Upvotes: 3