Reputation: 12822
I am looking for a way to extract dates (day, month, year) from a text. That is, I want to find all dates (or rather - as many as possible) in a human-written string.
Is there a Python regular expression covering as many possible formats as possible?
Comment:
from dateutil.parser import parse
parse(s, fuzzy = True)
works fine but it is constrained to one date per one string.
Example:
A program is taking place at sth from 21 January 2013 to 15th of February 2013. Applications for funding will be accepted until April 15, 2012. Notification of acceptance : 1st Aug. or later. Early payment due: 15.10.12. etc. Late: 11/20/12.
Usually (but not always) convention is more-or-less consistent for a single entry.
It is easy to create an regex for a few cases, I can do that. The question is if there is already one collecting many different.
Upvotes: 1
Views: 445
Reputation: 15978
If you want to roll your own, you can take inspiration from the Regexp::Common's time module, and the patterns there for time and dates.
Be warned: the code (direct link to it) is not trivial.
Upvotes: 1
Reputation: 298326
I've had good luck with the module parsedatetime
:
from parsedatetime import parsedatetime, parsedatetime_consts
pdt = parsedatetime.Calendar(parsedatetime_consts.Constants())
parsed, code = pdt.parse('''Your string''')
Upvotes: 0