too much php
too much php

Reputation: 91068

Writing a compiler for a DSL in python

I am writing a game in python and have decided to create a DSL for the map data files. I know I could write my own parser with regex, but I am wondering if there are existing python tools which can do this more easily, like re2c which is used in the PHP engine.

Some extra info:

Upvotes: 8

Views: 9902

Answers (8)

Eli Bendersky
Eli Bendersky

Reputation: 273716

DSLs are a good thing, so you don't need to defend yourself :-) However, have you considered an internal DSL ? These have so many pros versus external (parsed) DSLs that they're at least worth consideration. Mixing a DSL with the power of the native language really solves lots of the problems for you, and Python is not really bad at internal DSLs, with the with statement handy.

Upvotes: 2

abdullahmjawaz
abdullahmjawaz

Reputation: 11

Here is a simpler approach to solve it

What if I can extend python syntax with new operators to introduce new functionally to the language? For example, a new operator <=> for swapping the value of two variables.

How can I implement such behavior? Here comes AST module. The last module is a handy tool for handling abstract syntax trees. What’s cool about this module is it allows me to write python code that generates a tree and then compiles it to python code.

Let’s say we want to compile a superset language (or python-like language) to python:

from :

    a <=> b

to:

    a , b = b , a
  1. I need to convert my 'python like' source code into a list of tokens. So I need a tokenizer, a lexical scanner for Python source code. Tokenize module

  2. I may use the same meta-language to define both the grammar of new 'python-like' language and then build the structure of the abstract syntax tree AST

Why use AST?

  1. AST is a much safer choice when evaluating untrusted code
  2. manipulate the tree before executing the code Working on the Tree
from tokenize import untokenize, tokenize, NUMBER, STRING, NAME, OP, COMMA
import io
import ast

s = b"a <=> b\n" # i may read it from file
b = io.BytesIO(s)
g = tokenize(b.readline)
result = []
for token_num, token_val, _, _, _ in g:
    # naive simple approach to compile a<=>b to a,b = b,a
    if token_num == OP and token_val == '<=' and next(g).string == '>':
        first  = result.pop()
        next_token = next(g)
        second = (NAME, next_token.string)
        result.extend([
            first,
            (COMMA, ','),
            second,
            (OP, '='),
            second,
            (COMMA, ','),
            first,
        ])
    else:
        result.append((token_num, token_val))

src = untokenize(result).decode('utf-8')
exp = ast.parse(src)
code = compile(exp, filename='', mode='exec')


def my_swap(a, b):
    global code
    env = {
        "a": a,
        "b": b
    }
    exec(code, env)
    return env['a'], env['b']

print(my_swap(1,10))

Other modules using AST, whose source code may be a useful reference:

  • textX-LS: A DSL used to describe a collection of shapes and draw it for us.

  • pony orm: You can write database queries using Python generators and lambdas with translate to SQL query sting—pony orm use AST under the hood

  • osso: Role Based Access Control a framework handle permissions.

Upvotes: 1

ideasman42
ideasman42

Reputation: 48198

On the lines of declarative python, I wrote a helper module called 'bpyml' which lets you declare data in python in a more XML structured way without the verbose tags, it can be converted to/from XML too, but is valid python.

https://svn.blender.org/svnroot/bf-blender/trunk/blender/release/scripts/modules/bpyml.py

Example Use http://wiki.blender.org/index.php/User:Ideasman42#Declarative_UI_In_Blender

Upvotes: 1

user200905
user200905

Reputation:

For "small languages" as the one you are describing, I use a simple split, shlex (mind that the # defines a comment) or regular expressions.

>>> line = 'SOMETHING: !abc @123 #xyz/123'

>>> line.split()
['SOMETHING:', '!abc', '@123', '#xyz/123']

>>> import shlex
>>> list(shlex.shlex(line))
['SOMETHING', ':', '!', 'abc', '@', '123']

The following is an example, as I do not know exactly what you are looking for.

>>> import re
>>> result = re.match(r'([A-Z]*): !([a-z]*) @([0-9]*) #([a-z0-9/]*)', line)
>>> result.groups()
('SOMETHING', 'abc', '123', 'xyz/123')

Upvotes: 2

user21037
user21037

Reputation:

I have written something like this in work to read in SNMP notification definitions and automatically generate Java classes and SNMP MIB files from this. Using this little DSL, I could write 20 lines of my specification and it would generate roughly 80 lines of Java code and a 100 line MIB file.

To implement this, I actually just used straight Python string handling (split(), slicing etc) to parse the file. I find Pythons string capabilities to be adequate for most of my (simple) parsing needs.

Besides the libraries mentioned by others, if I were writing something more complex and needed proper parsing capabilities, I would probably use ANTLR, which supports Python (and other languages).

Upvotes: 2

Piotr Lesnicki
Piotr Lesnicki

Reputation: 9730

Yes, there are many -- too many -- parsing tools, but none in the standard library.

From what what I saw PLY and SPARK are popular. PLY is like yacc, but you do everything in Python because you write your grammar in docstrings.

Personally, I like the concept of parser combinators (taken from functional programming), and I quite like pyparsing: you write your grammar and actions directly in python and it is easy to start with. I ended up producing my own tree node types with actions though, instead of using their default ParserElement type.

Otherwise, you can also use existing declarative language like YAML.

Upvotes: 5

S.Lott
S.Lott

Reputation: 391992

Here's an approach that works really well.

abc= ONETHING( ... )
xyz= ANOTHERTHING( ... )
pqr= SOMETHING( this=abc, that=123, more=(xyz,123) )

Declarative. Easy-to-parse.

And...

It's actually Python. A few class declarations and the work is done. The DSL is actually class declarations.

What's important is that a DSL merely creates objects. When you define a DSL, first you have to start with an object model. Later, you put some syntax around that object model. You don't start with syntax, you start with the model.

Upvotes: 7

Matthew Trevor
Matthew Trevor

Reputation: 14961

I've always been impressed by pyparsing. The author, Paul McGuire, is active on the python list/comp.lang.python and has always been very helpful with any queries concerning it.

Upvotes: 13

Related Questions