Reputation: 2148
I have a data which looks like this:
data = 'person(firstame="bob", lastname="stewart", dob="2010-0206", hobbies=["reading, singing", "drawing"], is_minor=True)'
I wrote the grammar parsing rules as following:
quotedString.setParseAction(removeQuotes)
list_of_names = delimitedList(quotedString)
person_start = Literal("person(").suppress()
first = Literal("firstname") + Suppress("=") + quotedString
lastname = Literal("lastname") + Suppress("=") + quotedString
dob = Literal("dob") + Suppress("=") + quotedString
hobbies = Literal("hobbies") + Suppress("=[") + list_of_names + Suppress("]")
is_minor = Literal("is_minor") + Suppress("=") + oneOf("True False")
person_end = Suppress(")")
comma = Literal(",").suppress()
my_data = person_start + first + comma + last + comma + dob +comma + hobbies + comma + is_minor + person_end
result = my_data.parseString(data)
My questions are 3:
Upvotes: 1
Views: 345
Reputation: 63729
Your posted code had a few minor typos in it (firstame="bob"
in data vs. firstname="bob"
, lastname
vs. last
), but after cleaning them up, it looks pretty good. If you print out the result, you get:
['firstname', 'bob', 'lastname', 'stewart', 'dob', '2010-0206',
'hobbies', 'reading, singing', 'drawing', 'is_minor', 'True']
First off, let me suggest that, just as you defined list_of_names
(from your earlier SO question pyparsing string of quoted names) as a possible value type you define a boolean value to parse True/False values. Using oneOf
is good, let's add a parse action to convert from the strings "True" and "False" to actual Python boolean values:
boolean_value = oneOf("True False").setParseAction(lambda t: t[0]=='True')
This is similar to using removeQuotes
on quotedString.
Now, the parsed results now look like:
['firstname', 'bob', 'lastname', 'stewart', 'dob', '2010-0206',
'hobbies', 'reading, singing', 'drawing', 'is_minor', True]
Note that True is now not a string, but the Python value True
(no quotes around the value).
Now to the first part of your question, how to make this into a dict. Pyparsing allows you to define results names for different parts of your grammar, so that after the data is parsed, you can access those values by name. The syntax for doing this used to be to call the method setResultsName
:
my_data = person_start + first.setResultsName("firstname") +
last.setResultsName("lastname") + ...
I found this to be kind of cumbersome, and that the expression was harder to read with all the ".setResultsName" method calls. So a while back I changed the API to accept this syntax:
my_data = person_start + first("firstname") + last("lastname") + ...
But what you have defined as first
, last
, etc. contain more than just the value, they also include the label.
One way to simplify your grammar is to make a small helper method of your own, let's call it named_parameter
:
def named_parameter(label, paramtype):
expr = Literal(label) + Suppress('=') + paramtype(label)
return expr
Note that label
is used to specify both the literal string and the value's results name. Now you can define your grammar as:
first = named_parameter("firstname", quotedString)
last = named_parameter("lastname", quotedString)
dob = named_parameter("dob", quotedString)
hobbies = named_parameter("hobbies", Suppress("[") + list_of_names + Suppress("]"))
is_minor = named_parameter("is_minor", boolean_value)
With the values named, you can then access the parsed results as a Python dict:
print result["firstname"]
print result["hobbies"]
prints:
bob
['reading, singing', 'drawing']
Or if you prefer, you can also use object attribute notation:
print result.firstname
print result.hobbies
To answer the second part of your question, you asked how to handle the case where the parameters could be out of order. The easiest way to do this is to use delimitedList
again:
parameter = first | last | dob | hobbies | is_minor
my_data = person_start + delimitedList(parameter) + person_end
This is not a rigorous parser, it will accept parameter lists that don't have all the parameters, or lists with duplicate parameters. But for existing valid code, it will parse lists with parameters in any order.
Here's the final parser:
quotedString.setParseAction(removeQuotes)
list_of_names = delimitedList(quotedString)
boolean_value = oneOf("True False").setParseAction(lambda t: t[0]=='True')
def named_parameter(label, paramtype):
expr = Literal(label) + Suppress('=') + paramtype(label)
return expr
person_start = Literal("person(").suppress()
first = named_parameter("firstname", quotedString)
last = named_parameter("lastname", quotedString)
dob = named_parameter("dob", quotedString)
hobbies = named_parameter("hobbies", Suppress("[") + list_of_names + Suppress("]"))
is_minor = named_parameter("is_minor", boolean_value)
person_end = Suppress(")")
comma = Literal(",").suppress()
parameter = first | last | dob | hobbies | is_minor
my_data = person_start + delimitedList(parameter) + person_end
Upvotes: 2
Reputation: 142156
You should really break it up so it doesn't depend on literals quite so much... So look for tokens such that "X = Y" to make it more generic...
Alternatively, another option (since it looks like you're trying to parse a Python function call), is something along the lines of:
data = 'person(firstame="bob", lastname="stewart", dob="2010-0206", hobbies=["reading, singing", "drawing"], is_minor=True)'
import ast
d = {}
for kw in ast.parse(data).body[0].value.keywords:
if isinstance(kw.value, ast.List):
d[kw.arg] = [el.s for el in kw.value.elts]
else:
d[kw.arg] = getattr(kw.value, {ast.Name: 'id', ast.Str: 's'}[type(kw.value)])
# {'dob': '2010-0206', 'lastname': 'stewart', 'is_minor': 'True', 'firstame': 'bob', 'hobbies': ['reading, singing', 'drawing']}
Upvotes: 1