Reputation: 15173
I need some help trying to match a C include file with full path like so:
#include <stdio.h> -> stdio.h
#include "monkey/chicken.h" -> monkey/chicken.h
So far I have (adapted from another expression I found):
^\s*\#include\s+(["'<])([^"'<>/\|\b]+)*([">])
But, I'm kind of stuck at this point - it doesn't match in the second case, and I'm not sure how to return the result of the match, eg the file path back to regcomp().
BTW I've looked at regexplib.com, but can't find anything suitable.
Edit: Yes I am a total regexp newbie, using POSIX regex with regmatch_t and friends...
Upvotes: 9
Views: 5790
Reputation: 2342
That's what I did, in Python:
INCLUDE_RE = re.compile(rb'^\s*\#include\s+(?:<(.+)>|"(.+)")', re.MULTILINE)
This is to match in a byte string or mmap as per https://stackoverflow.com/a/43060761/15547292
If you want to match in a regular python string, you'll have to remove the b
prefix.
Upvotes: 0
Reputation: 143184
This would give better results:
^\s*\#include\s+["<]([^">]+)[">]
You then want to look at the first capture group when you get a match.
You don't say what language you're using, the factor you mention regcomp() leads me to believe that you're using POSIX regex library in C. If that's right, then you want to use the regexec function and use the nmatch and pmatch parameters to get the first capture group.
Upvotes: 6
Reputation: 8392
IF you want more precise solution that allows also comments before the include file as, for example,
/* ops, a comment */ /* oh, another comment */ #include "new_header1.h" /* let's try another with an #include "old_header.h" */
is:
^(?:\s*|\s*\/\*.*?\*\/)\s*#include\s*(?:(?:<)(?<PATH>.*?)(?:>)|(?:")(?<PATH>.*?)(?:"))
Upvotes: 2
Reputation: 10536
Here's what I wrote :
#include ((<[^>]+>)|("[^"]+"))
Does it fit ?
Upvotes: 7
Reputation: 1567
Not particularly well tested, but it matches your two cases:
^\s*#include\s+(<([^"'<>|\b]+)>|"([^"'<>|\b]+)")
The only problem is that due to the < and > thing, the result could be in capture group 2 or 3, so you should check if 2 is empty, then use 3... The advantage over some of the other answers is that it won't match sth like this: #include "bad.h> or this: #include <bad<<h>
And here's an example how to use (wrap) regcomp & friends:
static bool regexMatch(const std::string& sRegEx, const std::string& sSubject, std::vector<std::string> *vCaptureGroups)
{
regex_t re;
int flags = REG_EXTENDED | REG_ICASE;
int status;
if(!vCaptureGroups) flags |= REG_NOSUB;
if(regcomp(&re, sRegEx.c_str(), flags) != 0)
{
return false;
}
if(vCaptureGroups)
{
int mlen = re.re_nsub + 1;
regmatch_t *rawMatches = new regmatch_t[mlen];
status = regexec(&re, sSubject.c_str(), mlen, rawMatches, 0);
vCaptureGroups->clear();
vCaptureGroups->reserve(mlen);
if(status == 0)
{
for(size_t i = 0; i < mlen; i++)
{
vCaptureGroups->push_back(sSubject.substr(rawMatches[i].rm_so, rawMatches[i].rm_eo - rawMatches[i].rm_so - 1));
}
}
delete[] rawMatches;
}
else
{
status = regexec(&re, sSubject.c_str(), 0, NULL, 0);
}
regfree(&re);
return (status == 0);
}
Upvotes: 1
Reputation: 43120
You can try this regex:
(^\s*\#\s*include\s*<([^<>]+)>)|(^\s*\#\s*include\s*"([^"]+)")
I prefer to have seperate regex for
#include <>
and
#include ""
Upvotes: 3