Jack
Jack

Reputation: 133587

Find only first std::regex match efficiently

I'm trying to find an efficient way to greedily find the first match for a std::regex without analyzing the whole input.

My specific problem is that I wrote a hand made lexer and I'm trying to provide rules to parse common literal values (eg. a numeric value).

So suppose a simple let's say

std::regex integralRegex = std::regex("([+-]?[1-9]*[0-9]+)");

Is there a way to find the longest match starting from the beginning of input without scanning all of it? It looks like std::regex_match tries to match the whole input while std::regex_search forcefully finds all matches.

Maybe I'm missing a trivial overload for my purpose but I can't find an efficient solution to the problem.

Just to clarify the question: I'm not interested in stopping after first sub-match and ignore the remainder of input but for an input like "51+12*3" I'd like something that finds first 51 match and then stops, ignoring whatever is after.

Upvotes: 4

Views: 2000

Answers (1)

Marek R
Marek R

Reputation: 37882

First of all [+-]?[1-9]?[0-9]+ I think it does the same think, but should be a bit faster. Or you intend to use something like this: [+-]?[1-9][0-9]*|0 (zero without sign or number not starting with zero).

Secondly C++ provides regular expression iterator:

const std::string s = "51+12*3";

std::regex number_regex("[+-]?[1-9]?[0-9]+");
auto words_begin = 
    std::sregex_iterator(s.begin(), s.end(), number_regex);
auto words_end = std::sregex_iterator();

std::cout << "Found " 
          << std::distance(words_begin, words_end) 
          << " numbers:\n";

for (std::sregex_iterator i = words_begin; i != words_end; ++i) {
    std::smatch match = *i;                                                 
    std::string match_str = match.str(); 
    std::cout << match_str << '\n';
} 

And looks like this is what you need.

https://wandbox.org/permlink/tkaAfIslkWeY2poo

Upvotes: 2

Related Questions