Reputation: 703
I want to get access to the AST within cppyy before the python bindings are created. I'd like to use this to generate other kinds of bindings.
I have seen cppyy-generator, but it requires a separate installation of clang on the machine. Since cppyy can do JIT compilation without a separate installation of clang, I have to believe the AST is available from the underlying cling interpreter. Is there a way to get this AST info from cppyy?
example:
import cppyy
cppyy.cppdef("""
namespace foo
{
class Bar
{
public:
void DoSomething() {}
};
}
""")
cppyy can (amazingly) generate cppyy.gbl.foo.Bar for me. That means it must have used Cling to compile, get an AST, and generate python. How can I see the AST data?
Thanks!
Edit:
I can see that much of the information I need is in the cppyy-backend capi and cpp_cppyy files. However, my CPython foo is not strong enough to figure out how these get called and how I might access them from a python script.
Edit2:
Currently we're using a combination of castxml and pygccxml to generate a python data structure representing the AST. I see a lot of overlap with what cppyy and wish to reduce dependencies to cppyy only, since we're already using it for other things and it is nicely self-contained.
We use the AST data for multiple things. An important one is code generation. So we'd like to to iterate the AST much like you can with pygccxml.
Upvotes: 3
Views: 487
Reputation: 3788
There are a couple of ambiguities here b/c the same names apply to different steps and in different places. Let me explain the structure (and history), which may even answer your question.
cppyy-generator makes use of the Clang Python bindings. Thus, the AST it accesses is the C++ one, and it is available in its full (ugly) glory. You don't need any part of cppyy to use the Clang Python bindings. cppyy-generator serves a specific use case where you want all local C++ entities pre-loaded into a Python module. Since cppyy utilizes lazy everything and auto-loading, for performance reasons, the concept of "all C++ entities" (local or otherwise) does not have a well-defined meaning. Hence libclang was utilized, where the concept is clear.
The cppyy-backend capi (or C-API), is an API that was developed in reductio to serve the PyPy implementation of cppyy. It is a C-style API to bootstrap cppyy/C++. It is reduced to its essentials to write Python-C++ bindings, hiding many irrelevant details of the Clang AST (e.g. the 15 or so ways that a template can exist in the Clang AST are reduced to "IsTemplate" etc.). The backend C-API does not depend on, or use, Python in any way at all.
The implementation of the backend C-API is rather non-pretty. In part because of historic reasons (a bad thing), in part to hide all of Cling and thus Clang, to prevent clashes with other parts of an application that may be using Clang or LLVM (a good thing; the version of Clang in use by Cling is customized and may not work for e.g. Numba). Again, all this is completely independent of anything to do with Python.
Then, its use in Python. There are two different implementations: CPyCppyy for CPython, which is implemented in C, and the PyPy _cppyy module, which is implemented in RPython. Both perform the incantations to cross from Python into C++ through the C-API. Neither generates or uses the Python AST: both generate and manipulate Python entities directly. This happens lazily. Think the steps through: the Python user will, in your example above, type something like cppyy.gbl.foo.Bar().DoSomething()
. In cppyy, Python's __getattr__
is used to intercept the names, and then it simply goes through the backend to Cling to ask whether it knows what is foo
, Bar
etc. For example, the C-API GetScope("foo")
will return a valid identifier, so CPyCppyy/_cppyy knows to generate a Python class to represent the namespace. At no point, however, does it scan the global (or even foo
) namespace in the AST in full to generate the bindings a priori. Based on your description, there is nothing in CPyCppyy/_cppyy that would be of use to you.
To come back to your first statement, that you want to generate other types of bindings. You don't state what type of bindings, but the main reason for going for the C-API would be that it is on top of Cling, rather than Clang as the Clang AST directly from C++ or through its Python bindings would be. Cling offers easy access to the JIT, but you could also program that directly from Clang (its libraries, not the AST). As an example of such easyu access, in the backend C-API, you can just dump a string of C++ to be JITted into the compile
function (which does the exact same thing as cppdef
in your example). There are plans by the Cling folk to provide a better interface for dynamic languages from Cling directly, but this is a work in progress and not (AFAIK) available yet.
Finally, do note that Cling contains Clang, so if you install Cling, you still get Clang (and LLVM), too, which can be a heavy dependency.
EDIT: Fundamentally it remains that contrary to those other tools, cppyy does not offer a list of starting points (e.g. "all classes"), nor the full/true AST. You can copy over the cpp_cppyy.h
header from the backend (it is not otherwise part of the installation), simply include it, and use it (all symbols are exported already), but you need to know a priori the list of classes. Example:
import cppyy
cppyy.cppdef('#define RPY_EXPORTED extern')
cppyy.include('cpp_cppyy.h')
import cppyy
cppyy.cppdef("""
namespace foo {
class Bar {
public:
void DoSomething() {}
};
}""")
cpp = cppyy.gbl
capi = cpp.Cppyy
scope_id = capi.GetScope(cpp.foo.Bar.__cpp_name__) # need to know existence
for i in range(capi.GetNumMethods(scope_id)):
m = capi.GetMethod(scope_id, i)
print(capi.GetMethodName(m))
But as you can see, it does not offer a one-to-one result with the original code. For example, all the compiler-generated constructors and the destructor are listed as methods.
There also isn't really anything in the backend API like run_functions = unittests.member_functions('run')
as in the pygccxml documentation that you link. The reason is that such didn't make any sense in the context of cppyy. E.g. what if another header is loaded with more run
functions? What if it is a templated function and more instantiations pop up? What if a using namespace ...
appears in a later code, introducing more run
overloads?
cppyy does have that GetAllCppNames
C-API function, but it's not guaranteed to be exhaustive. It exists for the benefit of tab-completion in code editors (it is called in customized __dir__
functions of bound scopes). In fact, it is precisely because it wasn't complete that cppyy-generator
uses libclang.
You mention gInterpreter in the comments, but that is part of the history that I mentioned earlier: it's an ill-fated intermediate between the full AST as offered by libclang, and the minimalistic one needed for Python (such as the backend C-API). Yes, you can use it directly (it is, in fact, still used underneath the backend C-API), but it's a lot more clunky for little benefit.
For example, to handle that "getting all 'run' methods" example, you could do:
import cppyy
cppyy.cppdef("""
namespace foo {
void run(int) {}
void run(double) {}
}""")
cpp = cppyy.gbl
# using the ROOT/meta interface
cls = cpp.CppyyLegacy.TClass.GetClass(cpp.foo.__cpp_name__)
print('num "run" overloads:"', cls.GetListOfMethodOverloads('run').GetSize())
# directly through gInterpreter
gInterp = cpp.gInterpreter
cls = gInterp.ClassInfo_Factory(cpp.foo.__cpp_name__)
v = cpp.std.vector['const void*']()
gInterp.GetFunctionOverloads(cls, 'run', v)
gInterp.ClassInfo_Delete(cls)
print('num "run" overloads:"', len(v))
But the former interface (through CppyyLegacy.TClass) may not stay around and the gInterpreter one is really ugly as you can see.
I'm pretty sure you're not going to be happy trying to make cppyy replace the use of pygccxml and if I were you, I'd use the Clang Python bindings insteads.
Upvotes: 4