minch
minch

Reputation: 331

dateutil fuzzy parsing breaks when the input sentence contains the article 'a'

I am searching for a way to extract date information from a string. After reading another SO thread (Extracting date from a string in Python), it seems python-dateutil is an ideal solution. It has a fuzzy parsing method that is able to extract date info from any string:

Specifically, the method is

dateutil.parser.parse('your string here', fuzzy=True)

This works fine for many types of input strings containing a date, but I noticed that this method completely breaks when the input string contains the determiner 'a', as in sentences like

dateutil.parser.parse('a monkey on March 1, 2015', fuzzy=True)
dateutil.parser.parse("I ate a sandwich on March 1",fuzzy=True)

which all result in an error:

ValueError: Unknown string format

Does anyone know of a good workaround? Why does dateutil.parser break when the input contains the article "a"?

Upvotes: 2

Views: 2553

Answers (2)

minch
minch

Reputation: 331

It looks like the issue in 2.4 is due to a ValueError being fired when "a" for AM is matched without a corresponding hour.

In 2.3 and below, no such exception is fired.

Upvotes: 0

alecxe
alecxe

Reputation: 473933

Not sure if this is a good workaround, but there is no error using python-dateutil < 2.4:

>>> from dateutil.parser import parse
>>> parse('a monkey on March 1, 2015', fuzzy=True)
datetime.datetime(2015, 3, 1, 0, 0)
>>> parse("I ate a sandwich on March 1",fuzzy=True)
datetime.datetime(2015, 3, 1, 0, 0)
>>> dateutil.__version__
'2.3'

FYI, here is what I get using 2.4 (latest currently):

>>> from dateutil.parser import parse
>>> parse('a monkey on March 1, 2015', fuzzy=True)
Traceback (most recent call last):
    ...
    raise ValueError("Unknown string format")
ValueError: Unknown string format

Consider reporting the problem by creating a new issue at the dateutil bug tracker.

Upvotes: 3

Related Questions