Mark
Mark

Reputation: 2148

PyParsing a string representing a function

I have a data which looks like this:

data = 'person(firstame="bob", lastname="stewart", dob="2010-0206", hobbies=["reading, singing", "drawing"], is_minor=True)'

I wrote the grammar parsing rules as following:

quotedString.setParseAction(removeQuotes)
list_of_names = delimitedList(quotedString)

person_start = Literal("person(").suppress()
first = Literal("firstname") + Suppress("=") + quotedString
lastname = Literal("lastname") + Suppress("=") + quotedString
dob = Literal("dob") + Suppress("=") + quotedString
hobbies = Literal("hobbies") + Suppress("=[") + list_of_names + Suppress("]")
is_minor = Literal("is_minor") + Suppress("=") + oneOf("True False")
person_end = Suppress(")")
comma = Literal(",").suppress()

my_data = person_start + first +  comma + last + comma + dob +comma + hobbies + comma + is_minor + person_end
result = my_data.parseString(data)

My questions are 3:

  1. The above rules work but I wanted to make sure if there is a better way to write this.
  2. In my data the order is not guaranteed so lastname can come before firstname, how do I ensure that.
  3. Ultimately after parsing I want put everything as a dict so key:value first:"bob" hobbies:["reading", "singing", "drawing"] ...... What will be the best approach to take.

Upvotes: 1

Views: 345

Answers (2)

PaulMcG
PaulMcG

Reputation: 63729

Your posted code had a few minor typos in it (firstame="bob" in data vs. firstname="bob", lastname vs. last), but after cleaning them up, it looks pretty good. If you print out the result, you get:

['firstname', 'bob', 'lastname', 'stewart', 'dob', '2010-0206', 
 'hobbies', 'reading, singing', 'drawing', 'is_minor', 'True']

First off, let me suggest that, just as you defined list_of_names (from your earlier SO question pyparsing string of quoted names) as a possible value type you define a boolean value to parse True/False values. Using oneOf is good, let's add a parse action to convert from the strings "True" and "False" to actual Python boolean values:

boolean_value = oneOf("True False").setParseAction(lambda t: t[0]=='True')

This is similar to using removeQuotes on quotedString.

Now, the parsed results now look like:

['firstname', 'bob', 'lastname', 'stewart', 'dob', '2010-0206', 
 'hobbies', 'reading, singing', 'drawing', 'is_minor', True]

Note that True is now not a string, but the Python value True (no quotes around the value).

Now to the first part of your question, how to make this into a dict. Pyparsing allows you to define results names for different parts of your grammar, so that after the data is parsed, you can access those values by name. The syntax for doing this used to be to call the method setResultsName:

my_data = person_start + first.setResultsName("firstname") + 
          last.setResultsName("lastname") + ...

I found this to be kind of cumbersome, and that the expression was harder to read with all the ".setResultsName" method calls. So a while back I changed the API to accept this syntax:

my_data = person_start + first("firstname") + last("lastname") + ...

But what you have defined as first, last, etc. contain more than just the value, they also include the label.

One way to simplify your grammar is to make a small helper method of your own, let's call it named_parameter:

def named_parameter(label, paramtype):
    expr = Literal(label) + Suppress('=') + paramtype(label)
    return expr

Note that label is used to specify both the literal string and the value's results name. Now you can define your grammar as:

first = named_parameter("firstname", quotedString)
last = named_parameter("lastname", quotedString)
dob = named_parameter("dob", quotedString)
hobbies = named_parameter("hobbies", Suppress("[") + list_of_names + Suppress("]"))
is_minor = named_parameter("is_minor", boolean_value)

With the values named, you can then access the parsed results as a Python dict:

print result["firstname"]
print result["hobbies"]

prints:

bob
['reading, singing', 'drawing']

Or if you prefer, you can also use object attribute notation:

print result.firstname
print result.hobbies

To answer the second part of your question, you asked how to handle the case where the parameters could be out of order. The easiest way to do this is to use delimitedList again:

parameter = first | last | dob | hobbies | is_minor
my_data = person_start + delimitedList(parameter) + person_end

This is not a rigorous parser, it will accept parameter lists that don't have all the parameters, or lists with duplicate parameters. But for existing valid code, it will parse lists with parameters in any order.

Here's the final parser:

quotedString.setParseAction(removeQuotes)
list_of_names = delimitedList(quotedString)
boolean_value = oneOf("True False").setParseAction(lambda t: t[0]=='True')

def named_parameter(label, paramtype):
    expr = Literal(label) + Suppress('=') + paramtype(label)
    return expr

person_start = Literal("person(").suppress()
first = named_parameter("firstname", quotedString)
last = named_parameter("lastname", quotedString)
dob = named_parameter("dob", quotedString)
hobbies = named_parameter("hobbies", Suppress("[") + list_of_names + Suppress("]"))
is_minor = named_parameter("is_minor", boolean_value)
person_end = Suppress(")")
comma = Literal(",").suppress()

parameter = first | last | dob | hobbies | is_minor
my_data = person_start + delimitedList(parameter) + person_end

Upvotes: 2

Jon Clements
Jon Clements

Reputation: 142156

You should really break it up so it doesn't depend on literals quite so much... So look for tokens such that "X = Y" to make it more generic...

Alternatively, another option (since it looks like you're trying to parse a Python function call), is something along the lines of:

data = 'person(firstame="bob", lastname="stewart", dob="2010-0206", hobbies=["reading, singing", "drawing"], is_minor=True)'

import ast
d = {}
for kw in ast.parse(data).body[0].value.keywords:
    if isinstance(kw.value, ast.List):
        d[kw.arg] = [el.s for el in kw.value.elts]
    else:
        d[kw.arg] = getattr(kw.value, {ast.Name: 'id', ast.Str: 's'}[type(kw.value)])

# {'dob': '2010-0206', 'lastname': 'stewart', 'is_minor': 'True', 'firstame': 'bob', 'hobbies': ['reading, singing', 'drawing']} 

Upvotes: 1

Related Questions