Piotr Migdal
Piotr Migdal

Reputation: 12822

General Python regex to extract dates (d,m,y) in different formats

I am looking for a way to extract dates (day, month, year) from a text. That is, I want to find all dates (or rather - as many as possible) in a human-written string.

Is there a Python regular expression covering as many possible formats as possible?

Comment:

from dateutil.parser import parse
parse(s, fuzzy = True)

works fine but it is constrained to one date per one string.

Example:

A program is taking place at sth from 21 January 2013 to 15th of February 2013. Applications for funding will be accepted until April 15, 2012. Notification of acceptance : 1st Aug. or later. Early payment due: 15.10.12. etc. Late: 11/20/12.

Usually (but not always) convention is more-or-less consistent for a single entry.

It is easy to create an regex for a few cases, I can do that. The question is if there is already one collecting many different.

Upvotes: 1

Views: 445

Answers (2)

Robert P
Robert P

Reputation: 15978

If you want to roll your own, you can take inspiration from the Regexp::Common's time module, and the patterns there for time and dates.

Be warned: the code (direct link to it) is not trivial.

Upvotes: 1

Blender
Blender

Reputation: 298326

I've had good luck with the module parsedatetime:

from parsedatetime import parsedatetime, parsedatetime_consts

pdt = parsedatetime.Calendar(parsedatetime_consts.Constants())
parsed, code = pdt.parse('''Your string''')

Upvotes: 0

Related Questions