Julian
Julian

Reputation: 441

How do I create fixed number groups using regex?

In Excel, I have a column of dates all in different formats.

January 4 -9, 2019
January 22, 2019
January 28, 2019
January 31 - February 3, 2019 ***
January 29 - 31, 2019
October 28 – November 1, 2019 ***

I have this regex:

([\w]+)[\s]+([\d]{1,2})[\s]?[-–]?[\s]?([\w]+)?([\d]+)?[\s]?([\d]{1,2})?,?[\s]?([\d]{4})

which matches the above formats. The problem is for the dates with a second month (Jan/Feb, Oct/Nov) they are in group 3, which is the same group as the second day.

How can I have the second day to be in group 4 so everything is aligned?

What I want:

Group 1: Month 1
Group 2: Day 1
Group 3: Month 2 (if available)
Group 4: Day 2 (if available)
Group 6: Year

The reason I want to group is because I want to be able to split the columns to have START DATE and END DATE.

Using the first example:

Column B - START DATE: January 4, 2019
Column C - END DATE: January 9, 2019

If my regex can be simplified, that would be great. I also cannot figure out how to match the year start with 20XX. ([20][\d]{2}) doesn't seem to work.

https://regex101.com/r/5DXqdm/3

Upvotes: 0

Views: 111

Answers (3)

Emma
Emma

Reputation: 27743

I'm not sure if we can do that since there is an optional month in between, maybe you are going to have to add some additional capturing groups and then process those afterwards, other than than I can't think of a better way, or if possible, that'd likely be with look-arounds, which would make the expression pretty complex:

(\w+)\s+((\d{1,2})(?:\s+)?(?:[-–])?(?:\s+)?(\w+)?(\d+)?(?:\s+)?(\d{1,2})?),?(?:\s+)?(\d{4})

If I understand it correctly.

DEMO

Upvotes: 1

Ian
Ian

Reputation: 329

This regex pattern causes the second day to always be in group 4:

([A-z]+) (\d{1,2})(?:.*?([A-z]+))?(?:.*?(\d{1,2}))?.*?(20\d{2})

Demo

The pattern, broken down, is as follows:

  1. Group 1 ([A-z]+) matches any string of alpha characters. This matches the first month name. This is followed by a single space.
  2. Group 2 (\d{1,2}) matches a one or two digit day number.
  3. Non-matching optional group (?:.*?([A-z]+))? matches, if available, any characters followed by matching Group 3: a string of alpha characters ([A-z]+). This matches the optional second month.
  4. Non-matching optional group (?:.*?(\d{1,2}))? matches, if available, any characters followed by matching Group 4: one or two digits (\d{1,2}). This matches the optional second day number.
  5. Finally, .*?(20\d{2}) matches any characters followed by Group 5: the characters 2, 0, then any two digits (20\d{2}).

Upvotes: 0

tigeravatar
tigeravatar

Reputation: 26660

Does this have to be RegEx? Assuming a data setup like this:

enter image description here

In cell B2 and copied down is this formula for the Start Date:

=IF(ISNUMBER(A2),A2,DATEVALUE(TRIM(LEFT(A2,FIND("-",SUBSTITUTE(A2,CHAR(150),"-")&"-")-1))&", "&RIGHT(A2,4)))

And in cell C2 and copied down is this formula for the End Date:

=IF(ISNUMBER(A2),"",IFERROR(DATEVALUE(TRIM(RIGHT(A2,LEN(A2)-FIND("-",SUBSTITUTE(A2&"-",CHAR(150),"-"))-1))),DATEVALUE(LEFT(A2,FIND(" ",A2))&TRIM(MID(A2,FIND("-",SUBSTITUTE(A2,CHAR(150),"-"))+1,LEN(A2))))))

Format columns B and C as MMMM d, yyyy

Upvotes: 1

Related Questions