Horned Owl
Horned Owl

Reputation: 139

nested dictionary output from pyparsing

I'm using pyparsing to parse an expression of the form:

"and(or(eq(x,1), eq(x,2)), eq(y,3))"

My test code looks like this:

from pyparsing import Word, alphanums, Literal, Forward, Suppress, ZeroOrMore, CaselessLiteral, Group

field = Word(alphanums)
value = Word(alphanums)
eq_ = CaselessLiteral('eq') + Group(Suppress('(') + field + Literal(',').suppress() + value + Suppress(')'))
ne_ = CaselessLiteral('ne') + Group(Suppress('(') + field + Literal(',').suppress() + value + Suppress(')'))
function = ( eq_ | ne_ )

arg = Forward()
and_ = Forward()
or_ = Forward()

arg << (and_ | or_ |  function) + Suppress(",") + (and_ | or_ | function) + ZeroOrMore(Suppress(",") + (and_ | function))

and_ << Literal("and") + Suppress("(") + Group(arg) + Suppress(")")
or_ << Literal("or") + Suppress("(") + Group(arg) + Suppress(")")

exp = (and_ | or_ | function)

print(exp.parseString("and(or(eq(x,1), eq(x,2)), eq(y,3))"))

I have output in form:

['and', ['or', ['eq', ['x', '1'], 'eq', ['x', '2']], 'eq', ['y', '3']]]

List output looks OK. But for subsequent processing I'd like to have output in form of a nested dictionary:

{
    name: 'and',
    args: [
        {
            name: 'or',
            args: [
                {
                    name: 'eq',
                    args: ['x','1']
                },
                {
                    name: 'eq',
                    args: ['x','2']
                }
            ]
        },
        {
            name: 'eq',
            args: ['y','3']
        }
    ]
}

I have tried Dict class but without success.

Is it possible to do it in pyparsing? Or should I manually format list output?

Upvotes: 6

Views: 2885

Answers (2)

PaulMcG
PaulMcG

Reputation: 63762

The feature you are looking for is an important one in pyparsing, that of setting results names. Using results names is recommended practice for most pyparsing applications. This feature has been there since version 0.9, as

expr.setResultsName("abc")

This allows me to access this particular field of the overall parsed results as res["abc"] or res.abc (where res is the value returned from parser.parseString). You can also call res.dump() to see a nested view of your results.

But still mindful of keeping parsers easy to follow at-a-glance, I added support for this form of setResultsName back in 1.4.6:

expr("abc")

Here is your parser with a little cleanup, and results names added:

COMMA,LPAR,RPAR = map(Suppress,",()")
field = Word(alphanums)
value = Word(alphanums)
eq_ = CaselessLiteral('eq')("name") + Group(LPAR + field + COMMA + value + RPAR)("args")
ne_ = CaselessLiteral('ne')("name") + Group(LPAR + field + COMMA + value + RPAR)("args")
function = ( eq_ | ne_ )

arg = Forward()
and_ = Forward()
or_ = Forward()
exp = Group(and_ | or_ | function)

arg << delimitedList(exp)

and_ << Literal("and")("name") + LPAR + Group(arg)("args") + RPAR
or_ << Literal("or")("name") + LPAR + Group(arg)("args") + RPAR

Unfortunately, dump() only handles nesting of results, not lists of values, so it is not quite as nice as json.dumps (maybe this would be a good enhancement to dump?). So here is a custom method to dump out your nested name-args results:

ob = exp.parseString("and(or(eq(x,1), eq(x,2)), eq(y,3))")[0]

INDENT_SPACES = '    '
def dumpExpr(ob, level=0):
    indent = level * INDENT_SPACES
    print (indent + '{')
    print ("%s%s: %r," % (indent+INDENT_SPACES, 'name', ob['name']))
    if ob.name in ('eq','ne'):
        print ("%s%s: %s"   % (indent+INDENT_SPACES, 'args', ob.args.asList()))
    else:
        print ("%s%s: ["   % (indent+INDENT_SPACES, 'args'))
        for arg in ob.args:
            dumpExpr(arg, level+2)
        print ("%s]"   % (indent+INDENT_SPACES))
    print (indent + '}' + (',' if level > 0 else ''))
dumpExpr(ob)

Giving:

{
    name: 'and',
    args: [
        {
            name: 'or',
            args: [
                {
                    name: 'eq',
                    args: ['x', '1']
                },
                {
                    name: 'eq',
                    args: ['x', '2']
                },
            ]
        },
        {
            name: 'eq',
            args: ['y', '3']
        },
    ]
}

Upvotes: 11

enrico.bacis
enrico.bacis

Reputation: 31514

I don't think pyparsing has something like that, but you can recursively create the data structures:

def toDict(lst):
    if not isinstance(lst[1], list):
        return lst
    return [{'name': name, 'args': toDict(args)}
            for name, args in zip(lst[::2], lst[1::2])]

Your example behave differently on the number of args children. If it's only one you just use a dict, otherwise it's a list of dicts. That will lead to a complicated use. It's better to use a list of dicts even when there is a single child. This way you always know how to iterate the children without type-checking.

Example

We can use json.dumps to pretty print the output (note that here we print parsedict[0] because we know that the root has a single child, but we always return lists as specified before):

import json
parsed = ['and', ['or', ['eq', ['x', '1'], 'eq', ['x', '2']], 'eq', ['y', '3']]]
parsedict = toDict(parsed)
print json.dumps(parsedict[0], indent=4, separators=(',', ': '))

Output

{
    "name": "and",
    "args": [
        {
            "name": "or",
            "args": [
                {
                    "name": "eq",
                    "args": [
                        "x",
                        "1"
                    ]
                },
                {
                    "name": "eq",
                    "args": [
                        "x",
                        "2"
                    ]
                }
            ]
        },
        {
            "name": "eq",
            "args": [
                "y",
                "3"
            ]
        }
    ]
}

To obtain that output I replaced the dict with a collections.OrderedDict in the toDict functin, just to keep the name before args.

Upvotes: 2

Related Questions