I have some code (taken and adapted from here and here), which uses libclang to parse C++ sourcefiles in Python (Widnows) and get all of its declaration statements, as seen here:
import clang.cindex
def parse_decl(node):
reference_node = node.get_definition()
if node.kind.is_declaration():
node.location.line, ',', node.location.column,
for ch in node.get_children():
# configure path
clang.cindex.Config.set_library_file('C:/Program Files (x86)/LLVM/bin/libclang.dll')
index = clang.cindex.Index.create()
trans_unit = index.parse(r'C:\path\to\sourcefile\test.cpp', args=['-std=c++11'])
For the following C++ source file (test_ok.cpp
/* test_ok.cpp
#include <iostream>
#include <fstream>
#include <string>
#include <algorithm>
#include <cmath>
#include <iomanip>
using namespace std;
int main (int argc, char *argv[]) {
int linecount = 0;
double array[1000], sum=0, median=0, add=0;
string filename;
if (argc <= 1)
cout << "Error: no filename specified" << endl;
return 0;
//program checks if a filename is specified
filename = argv[1];
ifstream myfile (filename.c_str());
if (myfile.is_open())
myfile >> array[linecount];
while ( myfile.good() )
myfile >> array[linecount];
the parse
method parses as it should and outputs:
CursorKind.FUNCTION_DECL FUNCTION_DECL 12 , 5 main(int, char **)
CursorKind.PARM_DECL PARM_DECL 12 , 15 argc
CursorKind.PARM_DECL PARM_DECL 12 , 27 argv
CursorKind.VAR_DECL VAR_DECL 13 , 7 linecount
CursorKind.VAR_DECL VAR_DECL 14 , 10 array
CursorKind.VAR_DECL VAR_DECL 14 , 23 sum
CursorKind.VAR_DECL VAR_DECL 14 , 30 median
CursorKind.VAR_DECL VAR_DECL 14 , 40 add
CursorKind.VAR_DECL VAR_DECL 15 , 10 filename
CursorKind.VAR_DECL VAR_DECL 23 , 12 myfile
Process finished with exit code 0
for the following C++ source file (test.cpp
/* test.cpp
#include <iostream>
#include <vector>
#include <fstream>
#include <cmath>
#include <algorithm>
#include <iomanip>
using namespace std;
void readfunction(vector<double>& numbers, ifstream& myfile)
double number;
while (myfile >> number) {
double meanfunction(vector<double>& numbers)
double total=0;
vector<double>::const_iterator i;
for (i=numbers.begin(); i!=numbers.end(); ++i) {
total +=*i; }
return total/numbers.size();
the parsing is incomplete:
CursorKind.VAR_DECL VAR_DECL 10 , 6 readfunction
Process finished with exit code 0
the parsing cannot handle the lines such as vector<double>& numbers
etc and stops parsing that part of the code.
I believe the issue is similar as the one described in another SO question. I have tried to explicitly use the std=c++11
parse argument with no success. In an answer of that question (even though it did not solve the problem) the use of -x c++
is also suggested but I have no idea how to add that in my code above.
Anyone can point to a solution for libclang to parse C++ statements like the ones in test.cpp
Also, can I make it so it will continue parsing even though if it gets to a token it cannot parse?
Upvotes: 4
Views: 3317
Reputation: 42490
By default, libclang doesn't add the compiler system include path.
Always make sure you've checked the diagnostics - like compiler error messages, they tend to indicate how to resolve any issues. In this case, it would have been a reasonably obvious there was an include issue:
<Diagnostic severity 4, location <SourceLocation file 'test.cpp', line 3, column 10>, spelling "'iostream' file not found">
If you make sure libclang adds those paths, it should start working.
This question includes an approach to solving this problem. This seems to be a recurring theme on Stackoverflow, so I wrote ccsyspath to help find those paths on OSX, Linux and Windows. Simplifying your code slightly:
import clang.cindex
clang.cindex.Config.set_library_file('C:/Program Files (x86)/LLVM/bin/libclang.dll')
import ccsyspath
index = clang.cindex.Index.create()
args = '-x c++ --std=c++11'.split()
syspath = ccsyspath.system_include_paths('clang++')
incargs = [ b'-I' + inc for inc in syspath ]
args = args + incargs
trans_unit = index.parse('test.cpp', args=args)
for node in trans_unit.cursor.walk_preorder():
if node.location.file is None:
if != 'test.cpp':
if node.kind.is_declaration():
print(node.kind, node.location)
Where my args
end up being:
'-IC:\\Program Files (x86)\\LLVM\\bin\\..\\lib\\clang\\3.8.0\\include',
'-IC:\\Program Files (x86)\\Microsoft Visual Studio 12.0\\VC\\include',
'-IC:\\Program Files (x86)\\Windows Kits\\8.1\\include\\shared',
'-IC:\\Program Files (x86)\\Windows Kits\\8.1\\include\\um',
'-IC:\\Program Files (x86)\\Windows Kits\\8.1\\include\\winrt']
and the output is:
(CursorKind.USING_DIRECTIVE, <SourceLocation file 'test.cpp', line 10, column 17>)
(CursorKind.FUNCTION_DECL, <SourceLocation file 'test.cpp', line 12, column 6>)
(CursorKind.PARM_DECL, <SourceLocation file 'test.cpp', line 12, column 35>)
(CursorKind.PARM_DECL, <SourceLocation file 'test.cpp', line 12, column 54>)
(CursorKind.VAR_DECL, <SourceLocation file 'test.cpp', line 15, column 14>)
(CursorKind.FUNCTION_DECL, <SourceLocation file 'test.cpp', line 21, column 8>)
(CursorKind.PARM_DECL, <SourceLocation file 'test.cpp', line 21, column 37>)
(CursorKind.VAR_DECL, <SourceLocation file 'test.cpp', line 24, column 14>)
(CursorKind.VAR_DECL, <SourceLocation file 'test.cpp', line 25, column 40>)
Upvotes: 5