Praful Bagai
Praful Bagai

Reputation: 17382

Python - Fuzzy in dateutil.parser

I'm trying to find dates in a string. This is what I'm doing.

def _is_date(string, fuzzy=False):
    try: 
        return parse(string, fuzzy=fuzzy)
    except ValueError:
        return False

It works on some:-

>>> _is_date('delivered 22-jun-2022', fuzzy=True)
2022-06-22 00:00:00
>>> _is_date('04 sep, lets meet', fuzzy=True)
2022-09-04 00:00:00

however, it returns incorrect results for others.

>>> _is_date('Ive 4 kids', fuzzy=True)
2022-09-04 00:00:00
>> _is_date('samsung galaxy m32 (black,', fuzzy=True)
2022-09-23 00:00:32
>> _is_date('4gb ram..', fuzzy=True)
2022-09-04 00:00:00

How can I fix this? or is there any other way that can help me out with this problem statement.

Upvotes: 3

Views: 1165

Answers (1)

GreenMatt
GreenMatt

Reputation: 18590

The fuzzy flag isn't meant to be used the way you're using it. It is meant for processing strings along the lines of "Today is 9/23/22"; for this example, parse will ignore the "Today is " and parse the date/time portion.

Via experimentation, I found that when called with fuzzy=True, parse will try to interpret any character that is a digit as part of a date. Looking at the examples you expected to yield False:

  • 'Ive4 kids' returns a date/time of 2022-09-04 00 - the 4 was taken to be the 4th of the current month
  • 'samsung galaxy m32 (black,' gives 2022-09-23 00:00:32 - the 32 became the number of seconds after midnight today
  • 4gb ram..' - again, the 4 was taken to be the 4th day of the current month

It seems you won't be able to use fuzzy the way you're hoping; somehow you'll have to clean up the strings before you pass them to parse, probably rejecting those that don't have legitimate dates before calling parse.

You might find it instructive to experiment with fuzzy_with_tokens=True instead of fuzzy. With fuzzy_with_tokens set to True, you will receive a two item tuple with a datetime object holding the resulting date in the first item and the ignored text in the second. Also, this might be a useful resource for you: https://dateutil.readthedocs.io/en/stable/parser.html

Upvotes: 0

Related Questions