user1939565
user1939565

Reputation: 203

parse multiple dates using dateutil

I am trying to parse multiple dates from a string in Python with the help of this code,

from dateutil.parser import _timelex, parser
a = "Approve my leave from first half of 12/10/2012 to second half of 20/10/2012 "
p = parser()
info = p.info
def timetoken(token):
  try:
    float(token)
    return True
  except ValueError:
    pass
  return any(f(token) for f in (info.jump,info.weekday,info.month,info.hms,info.ampm,info.pertain,info.utczone,info.tzoffset))

def timesplit(input_string):
  batch = []
  for token in _timelex(input_string):
    if timetoken(token):
      if info.jump(token):
        continue
      batch.append(token)
    else:
      if batch:
        yield " ".join(batch)
        batch = []
  if batch:
    yield " ".join(batch)

for item in timesplit(a):
  print "Found:", item
  print "Parsed:", p.parse(item)

and the codes is taking second half from the string as second date and giving me this error,

raise ValueError, "unknown string format"

ValueError: unknown string format

when i change 'second half' to 'third half' or 'forth half' then it is working all fine.

Can any one help me to parse this string ?

Upvotes: 4

Views: 2019

Answers (2)

root
root

Reputation: 80346

Your parser couldn't handle the "second" found by timesplit,if you set the fuzzy param to be True, it doesn't break but nor does it produce anything meaningful.

from cStringIO import StringIO
for item in timesplit(StringIO(a)):
    print "Found:", item
    print "Parsed:", p.parse(StringIO(item),fuzzy=True)

out:

Found: 12 10 2012
Parsed: 2012-12-10 00:00:00
Found: second
Parsed: 2013-01-11 00:00:00
Found: 20 10 2012
Parsed: 2012-10-20 00:00:00

You have to fix the timesplitting or handle the errors:

opt1:

lose the info.hms from timetoken

opt2:

from cStringIO import StringIO
for item in timesplit(StringIO(a)):
    print "Found:", item
    try:
        print "Parsed:", p.parse(StringIO(item))
    except ValueError:
        print 'Not Parsed!'

out:

Found: 12 10 2012
Parsed: 2012-12-10 00:00:00
Found: second
Not Parsed!
Parsed: Found: 20 10 2012
Parsed: 2012-10-20 00:00:00

Upvotes: 3

Mauro Baraldi
Mauro Baraldi

Reputation: 6575

If you need only dates, could extract it with regex and works with dates.

a = "Approve my leave from first half of 12/10/2012 to second half of 20/10/2012 "

import re
pattern = re.compile('\d{2}/\d{2}/\d{4}')
pattern.findall(a)
['12/10/2012', '20/10/2012']

Upvotes: 2

Related Questions