Reputation: 1477
I want to get the first #include
statement from a .cpp file using Python regex as fast as possible.
For example,
/* Copyright:
This file is
protected
#include <bad.h>
*/
// Include files:
#undef A_MACRO
#include <stddef.h> // defines NULL
#include "logger.h"
// Global static pointer used to ensure a single instance of the class.
Logger* Logger::m_pInstance = NULL;
should return #include <stddef.h>
I know one way is to remove all comments and then get the first line from the remaining texts. But this seems not to be fast enough since it has to go through the whole file. If I only need the first #include
statement, is there any efficient way I can do it using Python regex?
[Update 1] Several folks mentioned it's not a good solution to use regex. I understand this is not a typical use case of regex. But is there a better way to get rid of the leading comments than regex? Any suggestion would be appreciated.
[Update 2] Thanks for the answers. But seems there is no one I am satisfied yet. My requirements are straightforward: (1) avoid going through the whole file to get the first line. (2) Need to handle the leading comments correctly.
Upvotes: 5
Views: 1330
Reputation: 10208
Does it have to be Regex? Code below stops at the first line, handles nested comments, and doesn't break on the // /*This is a comment
case.
incomment = False
with open(r'myheader.h') as f:
for line in f:
if not incomment:
line = line.split('//')[0]
if line.startswith('#include'):
print line
break
if '/*' in line:
incomment = True
if '*/' in line:
incomment = False
Upvotes: 0
Reputation: 52039
What about using the C-preprocessor itself?
If you run gcc -E foo.cpp
(where foo.cpp
is your sample input file) you will get:
# 1 "foo.cpp"
# 1 "<built-in>" 1
# 1 "<built-in>" 3
# 326 "<built-in>" 3
# 1 "<command line>" 1
# 1 "<built-in>" 2
# 1 "foo.cpp" 2
# 1 "/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/include/stddef.h" 1 3 4
The lines before # 1 "foo.cpp" 2
is boilerplate and can be ignored. (See what your C-preprocessor generates here.)
When you get to # 1 some-other-file ...
you know you've hit a #include.
You will get a complete path name (not the way it appears in the #include statement), but you can also deduce where the #include appeared by looking backwards for the last line marker.
In this case the last line marker is # 1 foo.cpp 2
and it appears 9 lines back, so the #include for stddef.h was on line 9 of foo.cpp
.
So now you can go back to the original file and grab line 9.
Upvotes: 1
Reputation: 10069
You can use a library called CppHeaderParser like this:
import sys
import CppHeaderParser
cppHeader = CppHeaderParser.CppHeader("test.cpp")
print("List of includes:")
for incl in cppHeader.includes:
print " %s" % incl
For it to work you should do
pip install cppheaderparser
It outputs:
List of includes:
<stddef.h> // defines NULL
"logger.h"
Certainly not the best result, but it's a start.
Upvotes: 4