Itachi Uchiwa
Itachi Uchiwa

Reputation: 3164

How to detect C++ valid identifiers using regex?

I am a beginner to Regular expressions although I know how to use them, searching, replacing...

I want to write a program that detects C++ valid identifiers. e.g:

_ _f _8 my_name age_ var55 x_ a

And so on...

So I've tried this:

std::string str = "9_var 57age my_marks cat33 fit*ell +bin set_";
std::string pat = "[_a-z]+[[:alnum:]]*";
std::regex reg(pat, std::regex::icase);
std::smatch sm;
if(std::regex_search(str, sm, reg))
    std::cout << sm.str() << '\n';
else
    std::cout << "no valid C++ identifier found!\n";

The output:

_var

But as we know a C++ identifier should not start with a digit so 9_var mustn't be a candidate for the matches. But what I see here is the compiler takes only the sub-string _var from 9_var and treated it as a much. I want to discard a whole word such "9_var". I need some way to get only matches those only start with an alphabetic character or an underscore.

So how can I write a program that detects valid identifiers? Thank you!

Upvotes: -1

Views: 792

Answers (1)

Stephen Newell
Stephen Newell

Reputation: 7838

Your pattern isn't checking for word boundaries, so it's able to match parts of a string. An updated regex looks like this:

std::string pat = "\\b[_a-z]+[[:alnum:]]*\\b";

With only that updated, the match is the first valid identifier in your string.

$ ./a.out 
my_marks

If you want to find all the valid identifiers, you'll need to loop. You'll also need to filter out reserved words, but regex isn't a good solution for that.

Upvotes: 1

Related Questions