Kinyugo
Kinyugo

Reputation: 471

How can I capture an optional group in regex?

I have a regex as follows:

    const verseRegex = /(?<chapterBegin>[^\\d+$]*):(?<verseBegin>[^\\d+$]*)-((?<chapterEnd>[^\\d+$]*):)?(?<verseEnd>[^\\d+$]*)/g;

I expect the regex to be able to match the following two strings:

However the regex is only able to match the first string and group it correctly:

console.log(verseRegex.exec('4:1-13');

[
  '4:1-13',
  '4',
  '1',
  undefined,
  undefined,
  '13',
  index: 0,
  input: '4:1-13',
  groups: [Object: null prototype] {
    chapterBegin: '4',
    verseBegin: '1',
    chapterEnd: undefined,
    verseEnd: '13'
  }
]

For the second string null is returned. I have no explanation for the behavior above. When I remove the optional group and rewrite my regex to be:

const verseRegex = /(?<chapterBegin>[^\\d+$]*):(?<verseBegin>[^\\d+$]*)-(?<chapterEnd>[^\\d+$]*):(?<verseEnd>[^\\d+$]*)/g;

now the second string is matched and grouped as expected and the first fails since the chapterEnd group is no longer optional. How can I rewrite my regex so that it matches and groups both strings?

Upvotes: 1

Views: 173

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626845

Note [^\\d+$]* pattern matches any character but a \, d, + and $ chars. You must have meant to match one or more digit chunks, so you need \d+.

You may use

/^(?<chapterBegin>\d+):(?<verseBegin>\d+)-(?:(?<chapterEnd>\d+):)?(?<verseEnd>\d+)$/

Or, without named capturing groups (for IE, e.g.):

/^(\d+):(\d+)-(?:(\d+):)?(\d+)$/

See the regex demo

See the JavaScript demo:

const strs = ['4:1-13','4:1-5:20'];
const rx = /^(?<chapterBegin>\d+):(?<verseBegin>\d+)-(?:(?<chapterEnd>\d+):)?(?<verseEnd>\d+)$/;
for (let s of strs) {
  const results = rx.exec(s);
  console.log(s, results.groups);
}

Output:

4:1-13 {
  "chapterBegin": "4",
  "verseBegin": "1",
  "chapterEnd": undefined,
  "verseEnd": "13"
}
4:1-5:20 {
  "chapterBegin": "4",
  "verseBegin": "1",
  "chapterEnd": "5",
  "verseEnd": "20"
}

Old browsers demo:

var strs = ['4:1-13','4:1-5:20'];
var rx = /^(\d+):(\d+)-(?:(\d+):)?(\d+)$/;
for (var i=0; i<strs.length; i++) {
  var results = rx.exec(strs[i]);
  console.log(strs[i], results);
}

Upvotes: 2

Related Questions