Reputation: 33
I want to write a regular expression in c++ to match a #include preprocessing directive. So I wrote this:
std::regex includePattern("^[[:blank:]|[:space:]]*#[[:blank:]|[:space:]]*include[[:blank:]|[:space:]]+[<|\"]{1}[_[:alpha:]]+[_[:alnum:]]*");
This is worked for:
std::string matchString = "#include <vector>";
But only matches parts of the string excluded the trailing ">", but if I change the regex to this:
std::regex includePattern("^[[:blank:]|[:space:]]*#[[:blank:]|[:space:]]*include[[:blank:]|[:space:]]+[<|\"]{1}[_[:alpha:]]+[_[:alnum:]]*[>|\"]{1}");
It just won't give me the desired result, just tell me "not found" ! Is there anything wrong?
Can anybody help me to write an accurate regular expression in c++ to match a #include preprocessing directive?
Thanks in advance!
Upvotes: 2
Views: 2445
Reputation: 153909
It depends on whether the input to be matched may contain new
lines? [[:space:]]
will match any white space, including new
lines, [[:blank:]]
will match any white space except new lines
(and I'm not sure it is supported by the standard). Anyway,
something like:
"^\\s*#\\s*include\\s+[<\"][^>\"]*[>\"]\\s*"
should do the trick, but...
If your source has new lines where it shouldn't, it still might match.
If your source has escaped new lines, say in the middle of the
token include
or the file name, it won't match. (This is
legal C++, but no one in their right mind would do it.)
If your source has mismatched delimiters, a "
at one end,
and a <
or a <
at the other, it will still match.
And it doesn't handle comments at the end of line. Handling
C++ style comments (//
) should only be a matter of adding
"(?://.*)?"
to the end of the expression. Handling C style
comments (particularly since there can be several) is a bit more
complicated.
To ensure that the delimiters match, you'd probably have to put everything after the include in an or:
"^\\s*#\\s*include\\s+(?:<[^>]*>|\"[^\"]*\")\\s*"
Again, you'd need to add to the end to handle comments.
Upvotes: 5
Reputation: 48615
If you need to capture the type of inclusion <
or "
and the included file name you could use:
std::string reg = "\\s*#\\s*include\\s*([<\"])([^>\"]+)([>\"])"; // escaped version
- or -
std::string raw = R"reg(\s*#\s*include\s*([<"])([^>"]+)([>"]))reg"; // raw string version
Group 1 = `<` or `"`
Group 2 = file name
Group 3 = `>` or `"`
Upvotes: 1
Reputation:
You aren't validating are you ?
One thing, you might be able to count on include's coming after the BOL and possible spaces.
And delimited on its right side with a whitespace.
Other than that, I wouldn't try to validate whats on the right of that.
Using Multi-line modifier only -
"(?m)^[^\\S\\r\\n]*#include[^\\S\\r\\n]+(.*?)[^\\S\\r\\n]*"
Expanded:
(?m)
^ [^\S\r\n]*
\#include
[^\S\r\n]+
( .*? ) # (1)
[^\S\r\n]*
Upvotes: 0
Reputation: 1375
The following regex will match #include
directives such as #include <vector>
^#include\s+<\w+>$
Note: this won't include directives such as #include stdio.h
.
Upvotes: -1