Reputation: 139
I'm using pyparsing to parse an expression of the form:
"and(or(eq(x,1), eq(x,2)), eq(y,3))"
My test code looks like this:
from pyparsing import Word, alphanums, Literal, Forward, Suppress, ZeroOrMore, CaselessLiteral, Group
field = Word(alphanums)
value = Word(alphanums)
eq_ = CaselessLiteral('eq') + Group(Suppress('(') + field + Literal(',').suppress() + value + Suppress(')'))
ne_ = CaselessLiteral('ne') + Group(Suppress('(') + field + Literal(',').suppress() + value + Suppress(')'))
function = ( eq_ | ne_ )
arg = Forward()
and_ = Forward()
or_ = Forward()
arg << (and_ | or_ | function) + Suppress(",") + (and_ | or_ | function) + ZeroOrMore(Suppress(",") + (and_ | function))
and_ << Literal("and") + Suppress("(") + Group(arg) + Suppress(")")
or_ << Literal("or") + Suppress("(") + Group(arg) + Suppress(")")
exp = (and_ | or_ | function)
print(exp.parseString("and(or(eq(x,1), eq(x,2)), eq(y,3))"))
I have output in form:
['and', ['or', ['eq', ['x', '1'], 'eq', ['x', '2']], 'eq', ['y', '3']]]
List output looks OK. But for subsequent processing I'd like to have output in form of a nested dictionary:
{
name: 'and',
args: [
{
name: 'or',
args: [
{
name: 'eq',
args: ['x','1']
},
{
name: 'eq',
args: ['x','2']
}
]
},
{
name: 'eq',
args: ['y','3']
}
]
}
I have tried Dict
class but without success.
Is it possible to do it in pyparsing? Or should I manually format list output?
Upvotes: 6
Views: 2885
Reputation: 63762
The feature you are looking for is an important one in pyparsing, that of setting results names. Using results names is recommended practice for most pyparsing applications. This feature has been there since version 0.9, as
expr.setResultsName("abc")
This allows me to access this particular field of the overall parsed results as res["abc"]
or res.abc
(where res
is the value returned from parser.parseString
). You can also call res.dump()
to see a nested view of your results.
But still mindful of keeping parsers easy to follow at-a-glance, I added support for this form of setResultsName back in 1.4.6:
expr("abc")
Here is your parser with a little cleanup, and results names added:
COMMA,LPAR,RPAR = map(Suppress,",()")
field = Word(alphanums)
value = Word(alphanums)
eq_ = CaselessLiteral('eq')("name") + Group(LPAR + field + COMMA + value + RPAR)("args")
ne_ = CaselessLiteral('ne')("name") + Group(LPAR + field + COMMA + value + RPAR)("args")
function = ( eq_ | ne_ )
arg = Forward()
and_ = Forward()
or_ = Forward()
exp = Group(and_ | or_ | function)
arg << delimitedList(exp)
and_ << Literal("and")("name") + LPAR + Group(arg)("args") + RPAR
or_ << Literal("or")("name") + LPAR + Group(arg)("args") + RPAR
Unfortunately, dump() only handles nesting of results, not lists of values, so it is not quite as nice as json.dumps (maybe this would be a good enhancement to dump?). So here is a custom method to dump out your nested name-args results:
ob = exp.parseString("and(or(eq(x,1), eq(x,2)), eq(y,3))")[0]
INDENT_SPACES = ' '
def dumpExpr(ob, level=0):
indent = level * INDENT_SPACES
print (indent + '{')
print ("%s%s: %r," % (indent+INDENT_SPACES, 'name', ob['name']))
if ob.name in ('eq','ne'):
print ("%s%s: %s" % (indent+INDENT_SPACES, 'args', ob.args.asList()))
else:
print ("%s%s: [" % (indent+INDENT_SPACES, 'args'))
for arg in ob.args:
dumpExpr(arg, level+2)
print ("%s]" % (indent+INDENT_SPACES))
print (indent + '}' + (',' if level > 0 else ''))
dumpExpr(ob)
Giving:
{
name: 'and',
args: [
{
name: 'or',
args: [
{
name: 'eq',
args: ['x', '1']
},
{
name: 'eq',
args: ['x', '2']
},
]
},
{
name: 'eq',
args: ['y', '3']
},
]
}
Upvotes: 11
Reputation: 31514
I don't think pyparsing
has something like that, but you can recursively create the data structures:
def toDict(lst):
if not isinstance(lst[1], list):
return lst
return [{'name': name, 'args': toDict(args)}
for name, args in zip(lst[::2], lst[1::2])]
Your example behave differently on the number of args
children. If it's only one you just use a dict
, otherwise it's a list of dicts. That will lead to a complicated use. It's better to use a list of dicts even when there is a single child. This way you always know how to iterate the children without type-checking.
We can use json.dumps to pretty print the output (note that here we print parsedict[0]
because we know that the root has a single child, but we always return lists as specified before):
import json
parsed = ['and', ['or', ['eq', ['x', '1'], 'eq', ['x', '2']], 'eq', ['y', '3']]]
parsedict = toDict(parsed)
print json.dumps(parsedict[0], indent=4, separators=(',', ': '))
Output
{
"name": "and",
"args": [
{
"name": "or",
"args": [
{
"name": "eq",
"args": [
"x",
"1"
]
},
{
"name": "eq",
"args": [
"x",
"2"
]
}
]
},
{
"name": "eq",
"args": [
"y",
"3"
]
}
]
}
To obtain that output I replaced the dict
with a collections.OrderedDict in the toDict
functin, just to keep the name
before args
.
Upvotes: 2