Reputation: 589
I need to split a line based on two separators: ' '
and ;
.
By example:
input : " abc ; def hij klm "
output: {"abc","def","hij","klm"}
How can I fix the function below to discard the first empty element?
std::vector<std::string> Split(std::string const& line) {
std::regex seps("[ ;]+");
std::sregex_token_iterator rit(line.begin(), line.end(), seps, -1);
return std::vector<std::string>(rit, std::sregex_token_iterator());
}
// input : " abc ; def hij klm "
// output: {"","abc","def","hij","klm"}
Below a complete sample that compiles:
#include <iostream>
#include <string>
#include <vector>
#include <regex>
std::vector<std::string> Split(std::string const& line) {
std::regex seps("[ ;]+");
std::sregex_token_iterator rit(line.begin(), line.end(), seps, -1);
return std::vector<std::string>(rit, std::sregex_token_iterator());
}
int main()
{
std::string line = " abc ; def hij klm ";
std::cout << "input: \"" << line << "\"" << std::endl;
auto collection = Split(line);
std::cout << "output: {";
auto bComma = false;
for (auto oneField : collection)
{
std::cout << (bComma ? "," : "") << "\"" << oneField << "\"";
bComma = true;
}
std::cout << "} " << std::endl;
}
Upvotes: 4
Views: 695
Reputation: 589
In case someone wants to copy the function revised based on the Jerry Coffin input using std::remove_copy_if:
std::vector<std::string> SplitLine(std::string const& line, const std::regex seps)
{
std::sregex_token_iterator rit(line.begin(), line.end(), seps, -1);
std::vector<std::string> tokens;
std::remove_copy_if(rit, std::sregex_token_iterator(),
std::back_inserter(tokens),
[](std::string const &s) { return s.empty(); });
return tokens;
}
Upvotes: 0
Reputation: 490098
I can see a couple possibilities beyond what's been mentioned in the other questions so far. The first would be to use std::remove_copy_if
when building your vector:
// regex stuff here
std::vector<std::string> tokens;
std::remove_copy_if(rit, std::sregex_token_iterator(),
std::back_inserter(tokens),
[](std::string const &s) { return s.empty(); });
Another possibility would be to create a locale that classified characters appropriately, and just read from there:
struct reader: std::ctype<char> {
reader(): std::ctype<char>(get_table()) {}
static std::ctype_base::mask const* get_table() {
static std::vector<std::ctype_base::mask> rc(table_size, std::ctype_base::mask());
rc[' '] = std::ctype_base::space;
rc[';'] = std::ctype_base::space;
// at a guess, newlines are probably still separators too:
rc['\n'] = std::ctype_base::space;
return &rc[0];
}
};
Once we have this, we tell the stream to use that locale when reading from (or writing to) the stream:
std::stringstream input(" abc ; def hij klm ");
input.imbue(std::locale(std::locale(), new reader));
Then we probably want to clean up the code for inserting commas only between tokens, rather than after every token. Fortunately, I wrote some code to handle that fairly neatly some time ago. Using it, we can copy tokens from the input above to standard output fairly simply:
std::cout << "{ ";
std::copy(std::istream_iterator<std::string>(input), {},
infix_ostream_iterator<std::string>(std::cout, ", "));
std::cout << " }";
Result: "{ abc, def, hij, klm }", exactly as you'd expect/hope for--without any extra kludges to make up for its starting out doing the wrong thing.
Upvotes: 3
Reputation: 180490
If you do not want to remove the elements from the vector after you populate it you can also traverse the iterator range and build the vector skipping the empty matches like
std::vector<std::string> Split(std::string const& line) {
std::regex seps("[ ;]+");
std::sregex_token_iterator rit(line.begin(), line.end(), seps, -1), end;
std::vector<std::string> tokens;
for(;rit != end; ++rit);
if (rit->length() != 0)
tokens.push_back(*rit)
return tokens;
}
Upvotes: 1
Reputation: 117856
You could always add an extra step at the end of the function to prune out the empty strings altogether, using the erase-remove idiom
std::vector<std::string> Split(std::string const& line) {
std::regex seps("[ ;]+");
std::sregex_token_iterator rit(line.begin(), line.end(), seps, -1);
auto tokens = std::vector<std::string>(rit, std::sregex_token_iterator());
tokens.erase(std::remove_if(tokens.begin(),
tokens.end(),
[](std::string const& s){ return s.empty(); }),
tokens.end());
return tokens;
}
Upvotes: 2