Kermit
Kermit

Reputation: 34055

Matching possible date elements in range

I'm having difficulty matching other cases for a date range. The end goal will be to extract each group to build an ISO 8601 date format.

Test cases

May 8th – 14th, 2019
November 25th – December 2nd
November 5th, 2018 – January 13th, 2019
September 17th – 23rd

Regex

(\w{3,9})\s([1-9]|[12]\d|3[01])(?:st|nd|rd|th),\s(19|20)\d{2}\s–\s(\w{3,9})\s([1-9]|[12]\d|3[01])(?:st|nd|rd|th),\s(19|20)\d{2}

regexr

I would like to be able to capture each group regardless if it exists or not.

For example, May 8th – 14th, 2019

Group 1 May
Group 2 8th
Group 3 
Group 4 
Group 5 14th
Group 6 2019

And November 5th, 2018 – January 13th, 2019

Group 1 November
Group 2 5th
Group 3 2018
Group 4 January
Group 5 13th
Group 6 2019

Upvotes: 2

Views: 69

Answers (2)

Adam
Adam

Reputation: 3965

This one saves some space by consolidating some of the groupings.

Try it here

Full regex:

([A-z]{3,9}) ((?:[1-9]|[12]\d|3[01])(?:st|nd|rd|th))(?:, ((?:19|20)\d{2}))? [–-] ([A-z]{3,9}\s)?((?:[1-9]|[12]\d|3[01])(?:st|nd|rd|th))(?:, ((?:19|20)\d{2}))?

Separated by group (spaces replaced by \s for readability):

1. ([A-z]{3,9})
   \s
2. ((?:[1-9]|[12]\d|3[01])(?:st|nd|rd|th))
3. (?:,\s((?:19|20)\d{2}))?
   \s[–-]\s
4. ([A-z]{3,9}\s)?
5. ((?:[1-9]|[12]\d|3[01])(?:st|nd|rd|th))
6. (?:,\s((?:19|20)\d{2}))?

This method does not use lookups so is generally safe for any regex engine.

Upvotes: 1

CertainPerformance
CertainPerformance

Reputation: 370759

To capture the empty string if the group doesn't match otherwise, the general idea is to use (<characters to match>|)

Try this one:

([A-z]{3,9})\s((?:[1-9]|[12]\d|3[01])(?:st|nd|rd|th))(?:, (?=19|20))?(\d{4}|)\s–\s([A-z]{3,9}|)\s?((?:[1-9]|[12]\d|3[01])(?:st|nd|rd|th))(?:, (?=19|20))?(\d{4}|)

https://regex101.com/r/4UY0WE/1/

When trying to capture the month (the first group), make sure to use [A-z]{3,9} rather than \w{3,9}, otherwise you might match, eg, 23rd rather than a month string.

Separated out:

([A-z]{3,9})      # Month ("January")
\s
((?:[1-9]|[12]\d|3[01])(?:st|nd|rd|th))  # Day of month, including suffix ("23rd")
(?:, (?=19|20))?  # Comma and space, if followed by year
(\d{4}|)          # Year
\s–\s             #
([A-z]{3,9}|)     # same as first line
\s?

# same as third to fifth lines:
((?:[1-9]|[12]\d|3[01])(?:st|nd|rd|th)) 
(?:, (?=19|20))?
(\d{4}|)

Upvotes: 1

Related Questions