SEAccount
SEAccount

Reputation: 33

Write c++ regular expression to match a #include preprocessing directive

I want to write a regular expression in c++ to match a #include preprocessing directive. So I wrote this:

std::regex includePattern("^[[:blank:]|[:space:]]*#[[:blank:]|[:space:]]*include[[:blank:]|[:space:]]+[<|\"]{1}[_[:alpha:]]+[_[:alnum:]]*");

This is worked for:

std::string matchString = "#include <vector>";

But only matches parts of the string excluded the trailing ">", but if I change the regex to this:

std::regex includePattern("^[[:blank:]|[:space:]]*#[[:blank:]|[:space:]]*include[[:blank:]|[:space:]]+[<|\"]{1}[_[:alpha:]]+[_[:alnum:]]*[>|\"]{1}");

It just won't give me the desired result, just tell me "not found" ! Is there anything wrong?

Can anybody help me to write an accurate regular expression in c++ to match a #include preprocessing directive?

Thanks in advance!

Upvotes: 2

Views: 2445

Answers (4)

James Kanze
James Kanze

Reputation: 153909

It depends on whether the input to be matched may contain new lines? [[:space:]] will match any white space, including new lines, [[:blank:]] will match any white space except new lines (and I'm not sure it is supported by the standard). Anyway, something like:

"^\\s*#\\s*include\\s+[<\"][^>\"]*[>\"]\\s*"

should do the trick, but...

  • If your source has new lines where it shouldn't, it still might match.

  • If your source has escaped new lines, say in the middle of the token include or the file name, it won't match. (This is legal C++, but no one in their right mind would do it.)

  • If your source has mismatched delimiters, a " at one end, and a < or a < at the other, it will still match.

  • And it doesn't handle comments at the end of line. Handling C++ style comments (//) should only be a matter of adding "(?://.*)?" to the end of the expression. Handling C style comments (particularly since there can be several) is a bit more complicated.

To ensure that the delimiters match, you'd probably have to put everything after the include in an or:

"^\\s*#\\s*include\\s+(?:<[^>]*>|\"[^\"]*\")\\s*"

Again, you'd need to add to the end to handle comments.

Upvotes: 5

Galik
Galik

Reputation: 48615

If you need to capture the type of inclusion < or " and the included file name you could use:

std::string reg = "\\s*#\\s*include\\s*([<\"])([^>\"]+)([>\"])"; // escaped version

- or -

std::string raw = R"reg(\s*#\s*include\s*([<"])([^>"]+)([>"]))reg"; // raw string version

Live Demo

Group 1 = `<` or `"`
Group 2 = file name
Group 3 = `>` or `"`

Upvotes: 1

user557597
user557597

Reputation:

You aren't validating are you ?
One thing, you might be able to count on include's coming after the BOL and possible spaces.
And delimited on its right side with a whitespace.
Other than that, I wouldn't try to validate whats on the right of that.

Using Multi-line modifier only -
"(?m)^[^\\S\\r\\n]*#include[^\\S\\r\\n]+(.*?)[^\\S\\r\\n]*"

Expanded:

 (?m)
 ^ [^\S\r\n]* 
 \#include
 [^\S\r\n]+ 
 ( .*? )               # (1)
 [^\S\r\n]* 

Upvotes: 0

Grice
Grice

Reputation: 1375

The following regex will match #include directives such as #include <vector>

^#include\s+<\w+>$

Note: this won't include directives such as #include stdio.h.

Upvotes: -1

Related Questions