Jai Prabhu
Jai Prabhu

Reputation: 207

C++ regular expression to match a string

I'm a little poor with regular expressions so I would appreciate help if someone can tell me what the right regular expression would be to capture the three elements that are in this format -

<element1>[<element2>="<element3>"]

I could use boost if needed. The delimiters in this string are '[', '=', ']', '"' and ' '.

Update: This is what I tried till now -

int main(void) {

   std::string subject("foo[bar=\"baz\"]");
   try {
      std::regex re("([a-zA-Z]+)[([a-zA-Z])=");
      std::sregex_iterator next(subject.begin(), subject.end(), re);
      std::sregex_iterator end;
      while (next != end) {
         std::smatch match = *next;
         std::cout << match.str() << std::endl;
         next++;
      }
   } catch (std::regex_error& e) {
      std::cout << "Error!" << std::endl;
   }
}

Though this give me -

foo[
bar
baz

Upvotes: 1

Views: 16031

Answers (2)

wally
wally

Reputation: 11002

You could use \[<\[" \]?(\[^<>\[\]" =\x0a\x0d\]+)\[>\[" \]? to get the elements:

#include <string>
#include <sstream>
#include <vector>
#include <iterator>
#include <regex>
#include <iostream>
#include <iomanip>

auto input_text{
R"(foo[bar="baz"]
<element1>[<element2>="<element3>"])"};

auto fromString(std::string str) {
    std::vector<std::string> elements;

    std::regex r{R"([<\[" ]?([^<>\[\]" =\x0a\x0d]+)[>\[" ]?)"};
    std::istringstream iss(str);
    auto it = std::sregex_iterator(str.begin(), str.end(), r);
    auto end = std::sregex_iterator();
    for(; it != end; ++it) {
        auto match = *it;
        auto element = match[1].str();
        elements.push_back(element);

    }
    return elements;
}

int main()
{
    auto result = fromString(input_text);
    for (auto t : result) {
        std::cout << t << '\n';
    }

    return 0;
}

Output:

foo
bar
baz
element1
element2
element3

Live demo

Upvotes: 1

Galik
Galik

Reputation: 48605

You don't need iterators for this, you can match it all in one expression with capture groups (<capture>) that return sub matches like this:

// Note: Raw string literal R"~()~" removes the need to escape the string
std::regex const e{R"~(([^[]+)\[([^=]+)="([^"]+)"\])~"}; 
//                     ^  1  ^  ^  2  ^  ^  3  ^
//                     |     |  |     |  |_____|------- sub_match #3
//                     |     |  |     |
//                     |     |  |_____|---------------- sub_match #2
//                     |     |
//                     |_____|------------------------- sub_match #1

std::string s(R"~(foo[bar="baz"])~"); // Raw string literal again

std::smatch m;

if(std::regex_match(s, m, e))
{
    std::cout << m[1] << '\n'; // sub_match #1
    std::cout << m[2] << '\n'; // sub_match #2
    std::cout << m[3] << '\n'; // sub_match #3
}

Upvotes: 4

Related Questions