Matthias
Matthias

Reputation: 4677

Include in std::regex search, exclude from std::sub_match using std::regex_token_iterator

I want to tokenize a std::string using whitespace characters as delimiters, but between a pair of quotes no delimiters should be considered and no other quotes should be allowed. To achieve this, I use the following regex (represented as a raw string literal):

R"((\"[^\"]*\")|\S+)"

which gives the following output when used as the std::regex of a std::sregex_token_iterator:

Test sample [Try It Online]:

#include <iostream>
#include <algorithm>
#include <iterator>
#include <regex>

int main() {
   std::string text = "Quick \"\"\"\" \"brown fox\".";
   std::regex re(R"((\"[^\"]*\")|\S+)");
   std::copy(std::sregex_token_iterator(text.cbegin(), text.cend(), re, 0),
             std::sregex_token_iterator(),
             std::ostream_iterator<std::string>(std::cout, "\n"));
}

Test output:

Quick
""
""
"brown fox"
.

This results in the inclusion of the surrounding quotes in the sub matches. Instead, I want to get rid of these surrounding quotes. To do so, I can obviously modify the iterated sub matches manually, but I wonder if it is possible and how one can achieve to eliminate the surrounding quotes using the std::regex and the std::sregex_token_iterator?

Changelog: I minimized/reduced the regex thanks to YSC.

Upvotes: 1

Views: 145

Answers (1)

Igor Tandetnik
Igor Tandetnik

Reputation: 52611

Something along these lines, perhaps:

#include <iostream>
#include <algorithm>
#include <iterator>
#include <regex>

int main() {
   std::string text = "Quick \"\"\"\" \"brown fox\".";
   std::regex re(R"((\"([^\"]*)\")|(\S+))");
   std::transform(
       std::sregex_iterator(text.cbegin(), text.cend(), re),
       std::sregex_iterator(),
       std::ostream_iterator<std::string>(std::cout, "\n"),
       [](const std::smatch& m) { return m[2].length() ? m[2] : m[3]; });
}

Demo

Upvotes: 1

Related Questions