Reputation: 413
I am currently working on a project involving a new faster execution environment/VM for Python on Linux. The python source is parsed into an intermediate AST, analyzed and the code for the target VM is generated JIT and cached. Due to the JIT nature of the proposed machine, speed is essential and I am writing it as native as possible. Currently its completely implemented in C apart a single python interface to the compiler module. Currently I am able to build the AST using the Python compiler module and have it in memory.
For example the code:
class Test:
def testFunc(arg1):
print 'Arg is ' + arg1
generates the AST
Module(None, Stmt([Class('Test', [], None, Stmt([Function(None, 'testFunc', ['arg1'], [], 0, None, Stmt([Printnl([Add((Const('Arg is '), Name('arg1')))], None)]))]), None)]))
What I want to know is a efficient method to parse this AST into a manipulable data structure like a tree, which can be traversed and the target code emitted. I am confused as to whether to go for a parser generator like Bison or Lemon or manually tokenize and parse it. Since the AST is obtained after extensive error checks, so no point of further error checks, hence I believe a parser generator is overkill. Python itself provides AST walkers but it slows it down. But then I am really not too sure how to go about manually deciphering it. I would really appreciate any algorithm or suggestion or if possible a native language implementation.
Upvotes: 0
Views: 295
Reputation: 65854
Python already has a fast parser (see Parser/parser.c
in the Python sources). You create a parser by calling PyParser_New
and send it tokens by calling PyParser_AddToken
. It builds a tree of node
objects (see Parser/node.h
):
typedef struct _node {
short n_type;
char *n_str;
int n_lineno;
int n_col_offset;
int n_nchildren;
struct _node *n_child;
} node;
So if the ast
module is too slow, use the C interface and process the parse tree directly.
Upvotes: 2