stanleyli
stanleyli

Reputation: 1477

How to get the first #include statement in C++ files using Python regex?

I want to get the first #include statement from a .cpp file using Python regex as fast as possible.

For example,

/* Copyright: 
This file is 
protected 
#include <bad.h>
*/

// Include files:
#undef A_MACRO
#include <stddef.h>  // defines NULL
#include "logger.h"

// Global static pointer used to ensure a single instance of the class.
Logger* Logger::m_pInstance = NULL; 

should return #include <stddef.h>

I know one way is to remove all comments and then get the first line from the remaining texts. But this seems not to be fast enough since it has to go through the whole file. If I only need the first #include statement, is there any efficient way I can do it using Python regex?

[Update 1] Several folks mentioned it's not a good solution to use regex. I understand this is not a typical use case of regex. But is there a better way to get rid of the leading comments than regex? Any suggestion would be appreciated.

[Update 2] Thanks for the answers. But seems there is no one I am satisfied yet. My requirements are straightforward: (1) avoid going through the whole file to get the first line. (2) Need to handle the leading comments correctly.

Upvotes: 5

Views: 1330

Answers (3)

Thane Plummer
Thane Plummer

Reputation: 10208

Does it have to be Regex? Code below stops at the first line, handles nested comments, and doesn't break on the // /*This is a comment case.

incomment = False

with open(r'myheader.h') as f:
    for line in f:
        if not incomment:
            line = line.split('//')[0]
            if line.startswith('#include'):
                print line
                break
            if '/*' in line:
                incomment = True
        if '*/' in line:
            incomment = False

Upvotes: 0

ErikR
ErikR

Reputation: 52039

What about using the C-preprocessor itself?

If you run gcc -E foo.cpp (where foo.cpp is your sample input file) you will get:

# 1 "foo.cpp"
# 1 "<built-in>" 1
# 1 "<built-in>" 3
# 326 "<built-in>" 3
# 1 "<command line>" 1
# 1 "<built-in>" 2
# 1 "foo.cpp" 2








# 1 "/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/include/stddef.h" 1 3 4

The lines before # 1 "foo.cpp" 2 is boilerplate and can be ignored. (See what your C-preprocessor generates here.)

When you get to # 1 some-other-file ... you know you've hit a #include.

You will get a complete path name (not the way it appears in the #include statement), but you can also deduce where the #include appeared by looking backwards for the last line marker.

In this case the last line marker is # 1 foo.cpp 2 and it appears 9 lines back, so the #include for stddef.h was on line 9 of foo.cpp.

So now you can go back to the original file and grab line 9.

Upvotes: 1

You can use a library called CppHeaderParser like this:

import sys
import CppHeaderParser

cppHeader = CppHeaderParser.CppHeader("test.cpp")

print("List of includes:")
for incl in cppHeader.includes:
    print " %s" % incl

For it to work you should do

pip install cppheaderparser

It outputs:

List of includes:
 <stddef.h>  // defines NULL
 "logger.h"

Certainly not the best result, but it's a start.

Upvotes: 4

Related Questions