Reputation: 168
(?:\d{1,2}[\-\/])?(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec|January|February|March|April|May|June|July|August|September|October|November|December)?[\,\.\s]*(?:\d{1,2}[\-\/\.)\s,]*)+(?:\d{2,4})(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec|January|February|March|April|May|June|July|August|September|October|November|December)?[\,\.\s]*(?:\d{1,2}[\-\/\.),]*)
I was trying to extract dates from the text from these ff. format:
Here's a sample. The problem is when it tries to extract from this format 2020 JAN. 1
, 2020 JAN. 01
, 2020 Jan. 01
, 2020-01-01
.
Upvotes: 0
Views: 39
Reputation: 626932
You can use
pattern = r"""(?ix)
\b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|June?|July?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|(?:Nov|Dec)(?:ember)?) [\s.]* (?:0?[1-9]|[12][0-9]|3[01]) [\s,.]* (?:19|20)(?:\d{2})? # Jan 01 2000
|
(?<!\d)(?:19|20)(?:\d{2})? [\s,.]* (?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|June?|July?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|(?:Nov|Dec)(?:ember)?) [\s.]* (?:0?[1-9]|[12][0-9]|3[01]) # 2000 Jan 01
|
(?<!\d)
(?:
(?:0?[1-9]|1[012])[-/.]?(?:0?[1-9]|[12][0-9]|3[01])[-/.]?(?:19|20)\d\d # MM/dd/yyyy
|
(?:19|20)\d\d[-/.]?(?:0?[1-9]|1[012])[-/.]?(?:0?[1-9]|[12][0-9]|3[01]) # yyyy/MM/dd
)
(?!\d)"""
See the regex demo
The i
modifier flag enables case insensitive matching and x
enables the VERBOSE mode.
Upvotes: 1