vanowm
vanowm

Reputation: 10201

Regex capture optional groups

I'm trying capture 2 groups of numbers, where each group is optional and should only be captured if contains numbers. Here is a list of all valid combinations that it supposed to match:

  1. 123(456)
  2. 123
  3. (456)
  4. abc(456)
  5. 123(efg)

And these are not valid combinations and should not be matched:

  1. abc(efg)
  2. abc
  3. (efg)

However, my regex fails on #4 and #5 combinations even though they contain numbers.

const list = ["123(456)", "123", "(456)", "abc(456)", "123(def)", "abc(def)", "abc", "(def)"];
const regex = /^(?:(\d+))?(?:\((\d+)\))?$/;

list.map((a,i) => console.log(i+1+". ", a + "=>".padStart(11-a.length," "), JSON.stringify((a.match(regex)||[]).slice(1))));
.as-console-wrapper{top:0;max-height:unset!important;overflow:auto!important;}

So, the question is why when used ? behind a group, it doesn't "skip" that group if nothing matched?

P.S. With this regex it also captures #4, but not #5: /(?:^|(\d+)?)(?:\((\d+)\))?$/

Upvotes: 1

Views: 497

Answers (3)

bobble bubble
bobble bubble

Reputation: 18490

Here an idea without global flag and supposed to only match the needed items:

^(?=\D*\d)(\d+)?\D*(?:\((\d*)\))?\D*$
  • ^(?=\D*\d) The lookahead at ^ start checks for at least a digit
  • (\d+)? capturing the digits to the optional first group
  • \D* followed by any amount of non digits
  • (?:\((\d*)\))? digits in parentheses to optional second group
  • \D*$ matching any amount of \D non digits up to the $ end

See your JS demo or a demo at regex101 (the [^\d\n] only for multiline demo)

Upvotes: 2

vanowm
vanowm

Reputation: 10201

@WiktorStribiżew and @akash had good ideas, but they are based on global flag, which requires additional loop to gather all the matches.

For now, I come up with this regex, which matches anything, but it captures only what I need.

const list = ["123(456)", "123", "(456)", "abc(456)", "123(def)", "abc(def)", "abc", "(def)"];
const regex = /(?:(\d+)|^|[^(]+)+?(?:\((?:(\d+)|\D*)\)|$)+?/;

list.map((a,i) => console.log(i+1+". ", a + "=>".padStart(11-a.length," "), JSON.stringify((a.match(regex)||[]).slice(1))));
.as-console-wrapper{top:0;max-height:unset!important;overflow:auto!important;}

Upvotes: 1

akash
akash

Reputation: 587

A solution to what you're looking for can be done with lookahead, see:

(?=^\d+(?:\(|$))(\d+)|(?=\d+\)$)(\d+)

Rough translation: a number from the start ending with a bracket (or end of line) OR a number in brackets somewhere in the text

To answer question on optional captured groups

Yes, if a group is marked optional e.g. (A*)? it does make the whole group optional. In your case, it is simply a case of the regex not matching - even if the optional part isn't there (verify with the help of a regex debugger)

Upvotes: 1

Related Questions