Reputation: 17382
I'm trying to find dates in a string. This is what I'm doing.
def _is_date(string, fuzzy=False):
try:
return parse(string, fuzzy=fuzzy)
except ValueError:
return False
It works on some:-
>>> _is_date('delivered 22-jun-2022', fuzzy=True)
2022-06-22 00:00:00
>>> _is_date('04 sep, lets meet', fuzzy=True)
2022-09-04 00:00:00
however, it returns incorrect results for others.
>>> _is_date('Ive 4 kids', fuzzy=True)
2022-09-04 00:00:00
>> _is_date('samsung galaxy m32 (black,', fuzzy=True)
2022-09-23 00:00:32
>> _is_date('4gb ram..', fuzzy=True)
2022-09-04 00:00:00
How can I fix this? or is there any other way that can help me out with this problem statement.
Upvotes: 3
Views: 1165
Reputation: 18590
The fuzzy flag isn't meant to be used the way you're using it. It is meant for processing strings along the lines of "Today is 9/23/22"; for this example, parse will ignore the "Today is " and parse the date/time portion.
Via experimentation, I found that when called with fuzzy=True
, parse will try to interpret any character that is a digit as part of a date. Looking at the examples you expected to yield False:
It seems you won't be able to use fuzzy the way you're hoping; somehow you'll have to clean up the strings before you pass them to parse, probably rejecting those that don't have legitimate dates before calling parse.
You might find it instructive to experiment with fuzzy_with_tokens=True
instead of fuzzy. With fuzzy_with_tokens set to True, you will receive a two item tuple with a datetime object holding the resulting date in the first item and the ignored text in the second. Also, this might be a useful resource for you: https://dateutil.readthedocs.io/en/stable/parser.html
Upvotes: 0