Andrew Li
Andrew Li

Reputation: 57972

How to capture specific group with JavaScript RegExp?

Given this sample text extracted from a PDF:

Professional Learning - August 31  Labor Day - September 3 Intersession- October 19 Professional Learning - October 22 Thanksgiving Break - November 21-23 Winter Break - December 24 - January 4 Martin Luther King, Jr. Day - January 21 Presidents’ Day - February 18 Spring Break - March 18-22 Teacher Comp Day - April 19

My goal is to capture all months and days, i.e. it should capture all of the following:

The hard part is capturing the ranges where the months are not the same. I came up with this RegExp:

/(January|February|March|April|May|August|September|October|November|December)\\s([0-9]*-?[0-9]+)(\s*-\s*(January|February|March|April|May|August|September|October|November|December)\\s([0-9]*-?[0-9]+))?/g

It works great for all except the last two examples listed above. On regexr, it shows that it captures it just fine in capture group #3, but I can't access that in JavaScript. Take this snippet for example:

const string = 'Professional Learning - August 31  Labor Day - September 3 Intersession- October 19 Professional Learning - October 22 Thanksgiving Break - November 21-23 Winter Break - December 24 - January 4 Martin Luther King, Jr. Day - January 21 Presidents’ Day - February 18 Spring Break - March 18-22 Teacher Comp Day - April 19';

const subRegex = '(January|February|March|April|May|August|September|October|November|December)\\s([0-9]*-?[0-9]+)';
const dateRegex = new RegExp(`${subRegex}(\s*-\s*${subRegex})?`, 'g');

console.log(string.match(dateRegex));

It seems like I can capture December 24 and January 4 separately, but not together. Is there any way to capture them together?

Upvotes: 0

Views: 55

Answers (1)

CertainPerformance
CertainPerformance

Reputation: 371069

You just need to tweak (and perhaps simplify) your original RE a bit:

const str = 'Professional Learning - August 31  Labor Day - September 3 Intersession- October 19 Professional Learning - October 22 Thanksgiving Break - November 21-23 Winter Break - December 24 - January 4 Martin Luther King, Jr. Day - January 21 Presidents’ Day - February 18 Spring Break - March 18-22 Teacher Comp Day - April 19';
// str2 has "December 24-January 4" instead - no spaces
const str2 = 'Professional Learning - August 31  Labor Day - September 3 Intersession- October 19 Professional Learning - October 22 Thanksgiving Break - November 21-23 Winter Break - December 24-January 4 Martin Luther King, Jr. Day - January 21 Presidents’ Day - February 18 Spring Break - March 18-22 Teacher Comp Day - April 19';
const re = /(January|February|March|April|May|August|September|October|November|December) [\d-]+([ -]*(January|February|March|April|May|August|September|October|November|December) \d+)?/g;
console.log(str.match(re));
console.log(str2.match(re));

Upvotes: 1

Related Questions