IMTheNachoMan
IMTheNachoMan

Reputation: 5811

match regular expression using regex order, not string order in JavaScript

I'm looking to extract a specific pattern from a string, and if its not there then extract another pattern.

The string in question may have both patterns in any order. Regardless of the order, I want the first pattern to take priority.

I know I could do this with multiple lines/match calls but I am wondering if it is possible to do this with a single match call.

var s1 = "hello A123 B456"
var s2 = "hello B123 A456"

I want to capture A### pattern first, and only if its not there, then capture the B### pattern

console.log(s1.match(/((A|B)\d{3})/)[1]); // A123
console.log(s2.match(/((A|B)\d{3})/)[1]); // B123 -- but I want it to capture the A123 first

Upvotes: 2

Views: 1322

Answers (3)

ctwheels
ctwheels

Reputation: 22817

You can accomplish this with either of the following regex patterns:

^(?:.*(A\d{3})|.*(B\d{3}))        # pattern 1
^(?:.*(?=A)|.*(?=B))([AB]\d{3})   # pattern 2

Pattern 1

Regex

Pros/Cons: Easy, but uses two capture groups

^(?:.*(A\d{3})|.*(B\d{3}))

How this works:

  • ^ anchors it to the start of the string
  • (?:.*(A\d{3})|.*(B\d{3})) match either of the following options
    • .* matches any character (except newline characters) any number of times (it's greedy so it'll match as much as possible)
    • (A\d{3}) matches A followed by 3 digits

The second options is the same as the first, but this works with backtracking:

^(?:.*(A\d{3})|.*(B\d{3}))

hello B123 A456
^                  # anchor to the start of the string (to the location before the h)
                   # now attempt option 1 of the alternation: .*(A\d{3})
    .*             # match any character any number of times (greedy)
hello B123 A456    # this is what we currently match
                   # now backtrack to find A\d{3}
hello B123 A       # we found A, check to see if \d{3} matches
hello B123 A456    # pattern fulfilled; result you're looking for in group 1

Code

s = ["hello A123 B456","hello B123 A456", "hello B123"]
r = /^(?:.*(A\d{3})|.*(B\d{3}))/
for (x of s) {
  m = x.match(r)
  if (m)
    console.log(m[1] || m[2])
}


Pattern 2

Regex

Pros/Cons: Less comprehensible, but uses only one capture group

^(?:.*(?=A)|.*(?=B))([AB]\d{3})

How this works:

  • ^ anchors it to the start of the string
  • (?:.*(?=A)|.*(?=B)) match either of the following options
    • .* matches any character (except newline characters) any number of times (it's greedy so it'll match as much as possible)
    • (?=A) ensures A follows the current position
  • The second alternation is the same as above, but uses (?=B) instead
  • ([AB]\d{3}) match A or B followed by 3 digits

Code

s = ["hello A123 B456","hello B123 A456", "hello B123"]
r = /^(?:.*(?=A)|.*(?=B))([AB]\d{3})/
for (x of s) {
  m = x.match(r)
  if (m)
    console.log(m[1])
}

Upvotes: 1

3limin4t0r
3limin4t0r

Reputation: 21110

You can also achieve the result with a negative lookahead.

var s1 = "hello A123 B456";
var s2 = "hello B123 A456";

const regex = /A\d{3}|B\d{3}(?!.*A\d{3})/

console.log(s1.match(regex)[0]);
console.log(s2.match(regex)[0]);

The above regex is saying, A with 3 digits or B with 3 digits if not followed by A with 3 digits.

Upvotes: 1

georg
georg

Reputation: 214949

I guess you can achieve that by anchoring the preferred option to the start of the subject:

re = /^.*?(A\d+)|(B\d+)/

test = [
  "hello A456 B123",
  "hello B123 A456",
  "hello A456 zzz",
  "hello B123 zzz",
];

for (t of test) {
  m = t.match(re)
  console.log(t, '=', m[1] || m[2])
}

A drawback is that you have two groups to choose from.

Upvotes: 2

Related Questions