Justicle
Justicle

Reputation: 15173

Regular expression to match C #include file

I need some help trying to match a C include file with full path like so:

#include <stdio.h>  -> stdio.h
#include "monkey/chicken.h" -> monkey/chicken.h

So far I have (adapted from another expression I found):

^\s*\#include\s+(["'<])([^"'<>/\|\b]+)*([">])

But, I'm kind of stuck at this point - it doesn't match in the second case, and I'm not sure how to return the result of the match, eg the file path back to regcomp().

BTW I've looked at regexplib.com, but can't find anything suitable.

Edit: Yes I am a total regexp newbie, using POSIX regex with regmatch_t and friends...

Upvotes: 9

Views: 5790

Answers (7)

mara004
mara004

Reputation: 2342

That's what I did, in Python:

INCLUDE_RE = re.compile(rb'^\s*\#include\s+(?:<(.+)>|"(.+)")', re.MULTILINE)

This is to match in a byte string or mmap as per https://stackoverflow.com/a/43060761/15547292 If you want to match in a regular python string, you'll have to remove the b prefix.

Upvotes: 0

Laurence Gonsalves
Laurence Gonsalves

Reputation: 143184

This would give better results:

^\s*\#include\s+["<]([^">]+)[">]

You then want to look at the first capture group when you get a match.

You don't say what language you're using, the factor you mention regcomp() leads me to believe that you're using POSIX regex library in C. If that's right, then you want to use the regexec function and use the nmatch and pmatch parameters to get the first capture group.

Upvotes: 6

IsaacR
IsaacR

Reputation: 1

This works for me:

'\#include\s*(<([^"<>|\b]+)>|"([^"<>|\b]+)")'

Upvotes: 0

Drake
Drake

Reputation: 8392

IF you want more precise solution that allows also comments before the include file as, for example,

  /* ops, a comment */ /* oh, another comment */   #include  "new_header1.h" /* let's try another with an #include "old_header.h" */

is:

^(?:\s*|\s*\/\*.*?\*\/)\s*#include\s*(?:(?:<)(?<PATH>.*?)(?:>)|(?:")(?<PATH>.*?)(?:"))

Upvotes: 2

Clement Herreman
Clement Herreman

Reputation: 10536

Here's what I wrote :

#include ((<[^>]+>)|("[^"]+"))

Does it fit ?

Upvotes: 7

KiNgMaR
KiNgMaR

Reputation: 1567

Not particularly well tested, but it matches your two cases:

^\s*#include\s+(<([^"'<>|\b]+)>|"([^"'<>|\b]+)")

The only problem is that due to the < and > thing, the result could be in capture group 2 or 3, so you should check if 2 is empty, then use 3... The advantage over some of the other answers is that it won't match sth like this: #include "bad.h> or this: #include <bad<<h>

And here's an example how to use (wrap) regcomp & friends:

 static bool regexMatch(const std::string& sRegEx, const std::string& sSubject, std::vector<std::string> *vCaptureGroups)
 {
  regex_t re;
  int flags = REG_EXTENDED | REG_ICASE;
  int status;

  if(!vCaptureGroups) flags |= REG_NOSUB;

  if(regcomp(&re, sRegEx.c_str(), flags) != 0)
  {
   return false;
  }

  if(vCaptureGroups)
  {
   int mlen = re.re_nsub + 1;
   regmatch_t *rawMatches = new regmatch_t[mlen];

   status = regexec(&re, sSubject.c_str(), mlen, rawMatches, 0);

   vCaptureGroups->clear();
   vCaptureGroups->reserve(mlen);

   if(status == 0)
   {
    for(size_t i = 0; i < mlen; i++)
    {
     vCaptureGroups->push_back(sSubject.substr(rawMatches[i].rm_so, rawMatches[i].rm_eo - rawMatches[i].rm_so - 1));
    }
   }

   delete[] rawMatches;
  }
  else
  {
   status = regexec(&re, sSubject.c_str(), 0, NULL, 0);
  }

  regfree(&re);

  return (status == 0);
 }

Upvotes: 1

Nick Dandoulakis
Nick Dandoulakis

Reputation: 43120

You can try this regex:

(^\s*\#\s*include\s*<([^<>]+)>)|(^\s*\#\s*include\s*"([^"]+)")

I prefer to have seperate regex for
#include <>
and
#include ""

Upvotes: 3

Related Questions