Reputation: 33
I have a simple sentence - "tok0,084040,tok1,tok2,231108" where 084040 is time (08:40:40) and 231108 is date (23.11.2008)
Following the pyparsing documentation I've written the rules to parse the tokens:
What I want is to have a logical group in my ParseResults consisting of time_token and date_token so that I could use setParseAction, setResultsName for the group. Something like
from pyparsing import *
d = Literal(',').suppress()
two_digits = Word(nums, exact=2)
tok0 = Word(nums)
time_token = two_digits("hour") + two_digits("min") + two_digits("sec")
tok1 = Word(alphas)
tok2 = oneOf('A B C')
date_token = two_digits("day") + two_digits("month") + two_digits("year")
grammar = (tok0 + d + time_token + d + tok1 + d + tok2 + d + date_token)
considering they are not adjacent.
Group(time_token + date_token)
P.S.: The result of the grammar.parseString should be an instance of ParseResults.dreamGroup = Group(time_token + date_token)("datetime").setParseAction(myFn)
parseResults = grammar.parseString("123,084040,ABC,A,231108")
datetime = parseResults.datetime
Upvotes: 2
Views: 256
Reputation: 33
Changing parseResults on grammar level solved the problem. So I slightly midified you answer and it worked like a charm.
def changeGrammarParseResults(s, loc, toks):
toks['datetime_8601'] = datetime.datetime(
toks.pop('year'), toks.pop('month'), toks.pop('day'),
toks.pop('hour'), toks.pop('minute'), toks.pop('second'),
tzinfo=pytz.utc).isoformat()
Upvotes: 1
Reputation: 63719
You can add results names in the body of a parse action, and they will remain with the parsed tokens.
def addDateTimeResults(tokens):
tokens['date'] = ('20'+tokens.year, tokens.month, tokens.day)
tokens['time'] = (tokens.hour, tokens.min, tokens.sec)
tokens['datetime'] = ParseResults([tokens.date, tokens.time])
for name in ('date', 'time'):
tokens['datetime'][name] = tokens[name]
grammar.setParseAction(addDateTimeResults)
Now in your sample code, add a call to dump()
to see what you get:
parseResults = grammar.parseString("123,084040,ABC,A,231108")
datetime = parseResults.datetime
print datetime.dump()
And you get:
[('2008', '11', '23'), ('08', '40', '40')]
- date: ('2008', '11', '23')
- time: ('08', '40', '40')
Or instead of inserting and returning tuples, you can construct an actual Python datetime object, and return that instead:
import datetime
def addDateTimeResults(tokens):
dtfields = map(int, (tokens[fld] for fld in "year month day hour min sec".split()))
# adjust 2-digit year for 21st century
dtfields[0] += 2000
tokens['datetime'] = datetime.datetime(*dtfields)
Now print parseResults.datetime
gives:
2008-11-23 08:40:40
which is the default string representation of a Python datetime object.
Upvotes: 3