Reputation: 4433
I have been reading this article (described here) and working on my own implementation of this. In the article they use an abstract syntax tree and a few other features to create a deep learning model for source code attribution.
Is it possible to get the abstract syntax tree of a c or c++ application from a Python application?
My research seems to show that this cannot be reliably done unless you can compile the code. Is there a way to get portions of it using a python script?
Upvotes: 0
Views: 1980
Reputation: 11430
In C++ you need to actually compile a snippet to get its AST.
You could guesswork from a snippet, but it would offer no guarantee. The reason for this is that code from elsewhere (headers, previous code) might affect the meaning of any piece of code.
Consider:
int x = y + z;
What do you expect the AST to be ? An addition and an assignment operator.
Now, what if z is Foo z;
class Foo
{
operator int() { /* bunch of operations,*/ return 0;}
}
Suddenly, your AST contains more stuff.
Then you can also have:
#define z (y * y)
Again, your AST changes.
So, to get it correct, your need the whole thing, parsing the whole source, includes, running pre-processor, resolving templates, visibility, overloads.
In practice, people then use something like LLVM and pass the responsibility to an actual compiler.
Upvotes: 3