kindall
kindall

Reputation: 184091

Python parser for Python-like language

I'm looking to write a Python import filter or preprocessor for source files that are essentially Python with extra language elements. The goal is to read the source file, parse it to an abstract syntax tree, apply some transforms in order to implement the new parts of the language, and write valid Python source which can then be consumed by CPython. I want to write this thing in Python and am looking for the best parser for the task.

The parser built in to Python is not appropriate because it requires the source files be actual Python, which these will not be. There are tons of parsers (or parser generators) that will work with Python, but it's hard to tell which is the best for my needs without a whole bunch of research.

In summary, my requirements are:

  1. Parser is written in Python or has Python bindings.
  2. Comes with a Python grammar that I can tweak, or can easily consume a tweakable Python grammar available elsewhere (such as http://docs.python.org/reference/grammar.html).
  3. Can re-serialize the AST after transforming it.
  4. Should not be too horrific to work with API-wise.

Any suggestions?

Upvotes: 14

Views: 3601

Answers (3)

Erez
Erez

Reputation: 1430

I would recommend that you check out my library: https://github.com/erezsh/lark

It can parse ALL context-free grammars, automatically builds an AST (with line & column numbers), and accepts the grammar in EBNF format, which is considered the standard.

It can easily parse a language like Python, and it can do so faster than any other parsing library written in Python.

In fact, there's already an example python grammar and parser

Upvotes: 7

Paulo Scardine
Paulo Scardine

Reputation: 77251

I like SimpleParse a lot, but I never tried to feed it the Python grammar (BTW, is it a deterministic grammar?). If it chokes, PLY will do the job.

See this compilation about Python parsing tools.

Upvotes: 2

Sven Marnach
Sven Marnach

Reputation: 601441

The first thing that comes to mind is lib2to3. It is a complete pure-Python implementation of a Python parser. It reads a Python grammar file and parses Python source files according to this grammar. It offers a great infrastructure for performing AST manipulations and writing back nicely formatted Python code -- after all it's purpose is to transform between two Python-like languages with slightly different grammars.

Unfortunately it's lacking documentation and doesn't guarantee a stable interface. There are projects that build on top of lib2to3 nevertheless, and the source code is quite readable. If API stability is an issue, you can just fork it.

Upvotes: 9

Related Questions