Reputation: 901
I would like to know how to match a date like this one "Oct 21, 2014" or "October 21, 2014"
What I have done so far is \b(?:Jan?|?:Feb?|?:Mar?|?:Apr?|?:May?|?:Jun?|?:Jul?|?:Aug?|?:Sep?|?:Oct?|?:Nov?|?:Dec?) [0-9]{1,2}[,] (?:19[7-9]\d|2\d{3})(?=\D|$)
but that doesn't get me anywhere
Upvotes: 17
Views: 46531
Reputation: 2189
The next could be used for dates with mistakes in month string with python:
"".join((re.compile('(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)(\.)?(\w*)?(\.)?(\s*\d{0,2}\s*),(\s*\d{4})', re.S + re.I).findall('Some wrong date is Septeme 28, 2002date') + ['n/a'])[0])
Output is:
'Septeme 28 2002'
1 group is a month star:
(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)
2-4 groups are optional suffixes of a month which could include a dot or alphanumeric characters:
(\.)?(\w*)?(\.)?
It matches .
, t.
tem
in Sep., Sept., Septem
5 group is date number which could be or could not be, so 0 in the expression stands for dates without date number:
(\s*\d{0,2}\s*)
6 group is a year:
(\s*\d{4})
\s*
stands for possible 'empty' characters (spaces, tabs and so on) from 0 to many
[0]
takes the first matching if a few dates tuples in the list
+ ['n/a']
could be added as an additional list element in case if no date matched, so at least 1 element in the list would exist and no 'list index out of range' error appear when [0] element is being taken
Upvotes: 1
Reputation: 11358
This may suffice your needs.
Keep in mind however that you will need more sophisticated validations such as validating the number of days for a specific month (say, February can have up to 28 days only (29 in bissext years), and so on)
(Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\s+(\d{1,2})\s+(\d{4})
Play with it here.
Again, this is definitely a very simple regex and you must have many better solutions out there, but perhaps this may be enough to your needs, I do not know.
Upvotes: 36