Reputation: 77
I'm looking for a way to split string by multiple delimiters using regex in C++ but without losing the delimiters in output, keeping the delimiters with splitted parts in order, for example:
Input
aaa,bbb.ccc,ddd-eee;
Output
aaa , bbb . ccc , ddd - eee ;
I've found some solutions for this but all in C# or java, looking for some C++ solution, preferably without using Boost.
Upvotes: 4
Views: 5965
Reputation: 15905
You could build your solution on top of the example for regex_iterator
. If, for example, you know your delimiters are comma, period, semicolon, and hyphen, you could use a regex that captures either a delimiter or a series of non-delimiters:
([.,;-]|[^.,;-]+)
Drop that into the sample code and you end up with something like this:
#include <iostream>
#include <string>
#include <regex>
int main ()
{
// the following two lines are edited; the remainder are directly from the reference.
std::string s ("aaa,bbb.ccc,ddd-eee;");
std::regex e ("([.,;-]|[^.,;-]+)"); // matches delimiters or consecutive non-delimiters
std::regex_iterator<std::string::iterator> rit ( s.begin(), s.end(), e );
std::regex_iterator<std::string::iterator> rend;
while (rit!=rend) {
std::cout << rit->str() << std::endl;
++rit;
}
return 0;
}
Try substituting in any other regular expressions you like.
Upvotes: 12
Reputation: 174874
For your case, splitting your input string according to the word boundary \b
except the one at the first will give you the desired output.
(?!^)\b
OR
(?<=\W)(?!$)|(?!^)(?=\W)
(?<=\W)(?!$)
Matches the boundaries which exists next to a non-word character but not the boundary present at the last.
|
OR
(?!^)(?=\W)
Matches the boundary which is followed by a non-word character except the one at the start.
Escape the backslash one more time if necessary.
Upvotes: 2