Reputation: 2933
I would like to split a string like this one
“this1245is@g$0,therhsuidthing345”
using a list of words like the one bellow
{“this”, “is”, “the”, “thing”}
into this list
{“this”, “1245”, “is”, “@g$0,”, “the”, “rhsuid”, “thing”, “345”}
// ^--------------^---------------^------------------^-- these were the delimiters
The delimiters are allowed to appear more than once in the string to split, and it can be done using regular expressions
The precedence is in the order in which the delimiters appear in the array
The platform I'm developing for has no support for the Boost library
Update
This is what I have for the moment
#include <iostream>
#include <string>
#include <regex>
int main ()
{
std::string s ("this1245is@g$0,therhsuidthing345");
std::string delimiters[] = {"this", "is", "the", "thing"};
for (int i=0; i<4; i++) {
std::string delimiter = "(" + delimiters[i] + ")(.*)";
std::regex e (delimiter); // matches words beginning by the i-th delimiter
// default constructor = end-of-sequence:
std::sregex_token_iterator rend;
std::cout << "1st and 2nd submatches:";
int submatches[] = { 1, 2 };
std::sregex_token_iterator c ( s.begin(), s.end(), e, submatches );
while (c!=rend) std::cout << " [" << *c++ << "]";
std::cout << std::endl;
}
return 0;
}
output:
1st and 2nd submatches:[this][x1245fisA@g$0,therhsuidthing345]
1st and 2nd submatches:[is][x1245fisA@g$0,therhsuidthing345]
1st and 2nd submatches:[the][rhsuidthing345]
1st and 2nd submatches:[thing][345]
I think I need to make some recursive thing to call on each iteration
Upvotes: 1
Views: 1402
Reputation: 672
Build the expression you want for matches only (re
), then pass in {-1, 0}
to your std::sregex_token_iterator
to return all non-matches (-1
) and matches (0
).
#include <iostream>
#include <regex>
int main() {
std::string s("this1245is@g$0,therhsuidthing345");
std::regex re("(this|is|the|thing)");
std::sregex_token_iterator iter(s.begin(), s.end(), re, { -1, 0 });
std::sregex_token_iterator end;
while (iter != end) {
//Works in vc13, clang requires you increment separately,
//haven't gone into implementation to see if/how ssub_match is affected.
//Workaround: increment separately.
//std::cout << "[" << *iter++ << "] ";
std::cout << "[" << *iter << "] ";
++iter;
}
}
Upvotes: 3
Reputation: 70382
I don't know how to perform the precedence requirement. This seems to work on the given input:
std::vector<std::string> parse (std::string s)
{
std::vector<std::string> out;
std::regex re("\(this|is|the|thing).*");
std::string word;
auto i = s.begin();
while (i != s.end()) {
std::match_results<std::string::iterator> m;
if (std::regex_match(i, s.end(), m, re)) {
if (!word.empty()) {
out.push_back(word);
word.clear();
}
out.push_back(std::string(m[1].first, m[1].second));
i += out.back().size();
} else {
word += *i++;
}
}
if (!word.empty()) {
out.push_back(word);
}
return out;
}
Upvotes: 2