Reputation: 1851
This is a build up on Build a simple parser that is able to parse different date formats using PyParse
I have a parser that should group one or more users together into a list
So a.parser('show abc, xyz commits from "Jan 10,2015" to "27/1/2015"')
should group the two usernames into a list [abc,xyz]
For users I have:
keywords = ["select", "show", "team", "from", "to", "commits", "and", "or"]
[select, show, team, _from, _to, commits, _and, _or] = [ CaselessKeyword(word) for word in keywords ]
user = Word(alphas+"."+alphas)
user2 = Combine(user + "'s")
users = OneOrMore((user|user2))
And the grammar is
bnf = (show|select)+Group(users).setResultsName("users")+Optional(team)+(commits).setResultsName("stats")\
+Optional(_from + quotedString.setParseAction(removeQuotes)('from') +\
_to + quotedString.setParseAction(removeQuotes)('to'))
This is erroneous. Can anyone guide me in the right direction. Also, is there a way in pyparse to selectively decide which group the word should fall under. What I mean is that 'xyz' standalone should go under my user list. But 'xyz team' should go under a team list. If the optional keyword team is provided then pyparse should group it differently.
I haven't been able to find what I am looking for online. Or maybe I haven't been framing my question correctly on Google?
Upvotes: 2
Views: 1332
Reputation: 63709
You are on the right track, see the embedded comments in this update to your parser:
from pyparsing import *
keywords = ["select", "show", "team", "from", "to", "commits", "and", "or"]
[select, show, team, _from, _to, commits, _and, _or] = [ CaselessKeyword(word) for word in keywords ]
# define an expression to prevent matching keywords as user names - used below in users expression
keyword = MatchFirst(map(CaselessKeyword, keywords))
user = Word(alphas+"."+alphas) # ??? what are you trying to define here?
user2 = Combine(user + "'s")
# must not confuse keywords like commit with usernames - and use ungroup to
# unpack single-element token lists
users = ungroup(~keyword + (user|user2))
#~ bnf = (show|select)+Group(users).setResultsName("users")+Optional(team)+(commits).setResultsName("stats") \
#~ + Optional(_from + quotedString.setParseAction(removeQuotes)('from') +
#~ _to + quotedString.setParseAction(removeQuotes)('to'))
def convertToDatetime(tokens):
# change this code to do your additional parsing/conversion to a Python datetime
return tokens[0]
timestamp = quotedString.setParseAction(removeQuotes, convertToDatetime)
# similar to your expression
# - use delimitedList instead of OneOrMore to handle comma-separated list of items
# - add distinction of "xxx team" vs "xxx"
# - dropped expr.setResultsName("name") in favor of short notation expr("name")
# - results names with trailing '*' will accumulate like elements into a single
# named result (short notation for setResultsName(name, listAllValues=True) )
# - dropped setResultsName("stats") on keyword "commits", no point to this, commits must always be present
#
bnf = ((show|select)("command") + delimitedList(users("team*") + team | users("user*")) + commits +
Optional(_from + timestamp('from') + _to + timestamp('to')))
test = 'show abc, def team, xyz commits from "Jan 10,2015" to "27/1/2015"'
print bnf.parseString(test).dump()
Prints:
['show', 'abc', 'def', 'team', 'xyz', 'commits', 'from', 'Jan 10,2015', 'to', '27/1/2015']
- command: show
- from: Jan 10,2015
- team: ['def']
- to: 27/1/2015
- user: ['abc', 'xyz']
Upvotes: 1