Simon Ninon
Simon Ninon

Reputation: 2451

C++ Regex: non-greedy match

I'm currently trying to make a regex which matches URL parameters and extracts them.

For example, if I got the following parameters string ?param1=someValue&param2=someOtherValue, std::regex_match should extract the following contents:

After trying different regex patterns, I finally built one corresponding to what I want: std::regex("(?:[\\?&]([^=&]+)=([^=&]+))*").

If I take the previous example, std::regex_match matches as expected. However, it does not extract the expected values, keeping only the last captured values.

For example, the following code:

std::regex paramsRegex("(?:[\\?&]([^=&]+)=([^=&]+))*");
std::string arg = "?param1=someValue&param2=someOtherValue";
std::smatch sm;

std::regex_match(arg, sm, paramsRegex);
for (const auto &match : sm)
   std::cout << match << std::endl;

will give the following output:

param2
someOtherValue

As you can see, param1 and its value are skipped and not captured.

After searching on google, I've found that this is due to greedy capture and I have modified my regex into "(?:[\\?&]([^=&]+)=([^=&]+))\\*?" in order to enable non-greedy capturing.

This regex works well when I try it on rubular but it does not match when I use it in C++ (std::regex_match returns false and nothing is captured).

I've tried different std::regex_constants options (different regex grammar by using std::regex_constants::grep, std::regex_constants::egrep, ...) but the result is the same.

Does someone know how to do non-greedy regex capture in C++?

Upvotes: 1

Views: 3440

Answers (2)

jstar
jstar

Reputation: 877

Try to use match_results::prefix/suffix:

string match_expression("your expression");
smatch result;
regex fnd(match_expression, regex_constants::icase);
while (regex_search(in_str, result, fnd, std::regex_constants::match_any)) 
{
    for (size_t i = 1; i < result.size(); i++)
    {           
        std::cout << result[i].str();
    }
    in_str = result.suffix();
}

Upvotes: 0

Simon Ninon
Simon Ninon

Reputation: 2451

As Casimir et Hippolyte explained in his comment, I just need to:

  • remove the quantifier
  • Use std::regex_iterator

It gives me the following code:

std::regex paramsRegex("[\\?&]([^=]+)=([^&]+)");
std::string url_params = "?key1=val1&key2=val2&key3=val3&key4=val4";
std::smatch sm;

auto params_it = std::sregex_iterator(url_params.cbegin(), url_params.cend(), paramsRegex);
auto params_end = std::sregex_iterator();

while (params_it != params_end) {
    auto param = params_it->str();

    std::regex_match(param, sm, paramsRegex);
    for (const auto &s : sm)
       std::cout << s << std::endl;

    ++params_it;
}

And here is the output:

?key1=val1
key1
val1
&key2=val2
key2
val2
&key3=val3
key3
val3
&key4=val4
key4
val4

The orignal regex (?:[\\?&]([^=&]+)=([^=&]+))* was just changed into [\\?&]([^=]+)=([^&]+).

Then, by using std::sregex_iterator, I get an iterator on each matching groups (?key1=val1, &key2=val2, ...).

Finally, by calling std::regex_match on each sub-string, I can retrieve parameters values.

Upvotes: 4

Related Questions