Reputation: 441
In Excel, I have a column of dates all in different formats.
January 4 -9, 2019
January 22, 2019
January 28, 2019
January 31 - February 3, 2019 ***
January 29 - 31, 2019
October 28 – November 1, 2019 ***
I have this regex:
([\w]+)[\s]+([\d]{1,2})[\s]?[-–]?[\s]?([\w]+)?([\d]+)?[\s]?([\d]{1,2})?,?[\s]?([\d]{4})
which matches the above formats. The problem is for the dates with a second month (Jan/Feb, Oct/Nov) they are in group 3, which is the same group as the second day.
How can I have the second day to be in group 4 so everything is aligned?
What I want:
Group 1: Month 1
Group 2: Day 1
Group 3: Month 2 (if available)
Group 4: Day 2 (if available)
Group 6: Year
The reason I want to group is because I want to be able to split the columns to have START DATE
and END DATE
.
Using the first example:
Column B - START DATE: January 4, 2019
Column C - END DATE: January 9, 2019
If my regex can be simplified, that would be great. I also cannot figure out how to match the year start with 20XX. ([20][\d]{2})
doesn't seem to work.
https://regex101.com/r/5DXqdm/3
Upvotes: 0
Views: 111
Reputation: 27743
I'm not sure if we can do that since there is an optional month in between, maybe you are going to have to add some additional capturing groups and then process those afterwards, other than than I can't think of a better way, or if possible, that'd likely be with look-arounds, which would make the expression pretty complex:
(\w+)\s+((\d{1,2})(?:\s+)?(?:[-–])?(?:\s+)?(\w+)?(\d+)?(?:\s+)?(\d{1,2})?),?(?:\s+)?(\d{4})
If I understand it correctly.
Upvotes: 1
Reputation: 329
This regex pattern causes the second day to always be in group 4:
([A-z]+) (\d{1,2})(?:.*?([A-z]+))?(?:.*?(\d{1,2}))?.*?(20\d{2})
The pattern, broken down, is as follows:
([A-z]+)
matches any string of alpha characters. This matches the first month name. This is followed by a single space.(\d{1,2})
matches a one or two digit day number.(?:.*?([A-z]+))?
matches, if available, any characters followed by matching Group 3: a string of alpha characters ([A-z]+)
. This matches the optional second month.(?:.*?(\d{1,2}))?
matches, if available, any characters followed by matching Group 4: one or two digits (\d{1,2})
. This matches the optional second day number..*?(20\d{2})
matches any characters followed by Group 5: the characters 2
, 0
, then any two digits (20\d{2})
.Upvotes: 0
Reputation: 26660
Does this have to be RegEx? Assuming a data setup like this:
In cell B2 and copied down is this formula for the Start Date:
=IF(ISNUMBER(A2),A2,DATEVALUE(TRIM(LEFT(A2,FIND("-",SUBSTITUTE(A2,CHAR(150),"-")&"-")-1))&", "&RIGHT(A2,4)))
And in cell C2 and copied down is this formula for the End Date:
=IF(ISNUMBER(A2),"",IFERROR(DATEVALUE(TRIM(RIGHT(A2,LEN(A2)-FIND("-",SUBSTITUTE(A2&"-",CHAR(150),"-"))-1))),DATEVALUE(LEFT(A2,FIND(" ",A2))&TRIM(MID(A2,FIND("-",SUBSTITUTE(A2,CHAR(150),"-"))+1,LEN(A2))))))
Format columns B and C as MMMM d, yyyy
Upvotes: 1