Trindaz
Trindaz

Reputation: 17879

Javascript: How to get multiple matches in RegEx .exec results

When I run

/(a)/g.exec('a a a ').length

I get

2

but I thought it should return

3

because there are 3 as in the string, not 2!

Why is that?

I want to be able to search for all occurances of a string in RegEx and iterate over them.

FWIW: I'm using node.js

Upvotes: 40

Views: 54555

Answers (13)

David Alsh
David Alsh

Reputation: 7677

Here's another solution:

let inputa = `"a" 'b' "c"`
let inputb = `"d" 'e' "f"`
let rx = /["'](.*?)["']/g

function *execAll(expr, input) { 
    while(true) {
        const current = expr.exec(input)
        if (!current) {
            break
        }
        yield current
    }
}

for (const current of execAll(rx, inputa)) {
  console.log(current)
}

console.log(Array.from(execAll(rx, inputb)))

Upvotes: 0

Julien Perrenoud
Julien Perrenoud

Reputation: 1591

Expanding on @ajaykools's answer, you can just paste the following into a RegExpExtensions.ts file

declare global {
  interface RegExp {
    execAll(string: string): RegExpExecArray[];
  }
}

RegExp.prototype.execAll = function (
  this: RegExp,
  string: string,
): RegExpExecArray[] {
  const results: RegExpExecArray[] = [];
  let result: RegExpExecArray | null;
  const regex = new RegExp(this, "g");
  while ((result = regex.exec(string))) {
    results.push(result);
  }
  return results;
};

export {};

Then you only need to add import "[...]/RegExpExtensions" once atop your index.ts and then you should be able to just do the following anywhere in your code:

const matches = /pattern/.execAll("some string")

Upvotes: 0

ajaykools
ajaykools

Reputation: 685

while loop can help you

x = 'a a a a';
y = new RegExp(/a/g);
while(null != (z=y.exec(x))) {
   console.log(z);     // output: object
   console.log(z[0]);  // ouput: "a"
}

If you add counter then you get length of it.

x = 'a a a a';
counter = 0;
y = new RegExp(/a/g);
while(null != (z=y.exec(x))) {
   console.log(z);     // output: object
   console.log(z[0]);  // output: "a"
   counter++;
}
console.log(counter);  // output: 4

This is quite safe, even if it doesn't find any matching then it just exits and counter will be 0

Main intention is to tell how RegExp can be used to loop and get all values from string of same matched RegExp

Upvotes: 28

daleyjem
daleyjem

Reputation: 2674

Spread <string>.matchAll() into an Array

If you want to use a regular expression to match (with capture groups) as an array, and chain Array methods on, all you have to do is spread (...) <string>.matchAll() into a []:

const mapped = [...'a b c'.matchAll(/(?<letter>[abc])/g)]
console.log(mapped)
// Outputs: (3) [Array(2), Array(2), Array(2)]

Now I can do whatever on the destructured capture group(s):

const mapped = [...'a b c'.matchAll(/(?<letter>[abc])/g)]
const onlyAorC = mapped.filter(({groups:{letter}}) => ['a','c'].includes(letter))
console.log(onlyAorC)
// Outputs: (2) [Array(2), Array(2)]

Upvotes: 1

Daniel
Daniel

Reputation: 1

If you want to iterate throw a regex without using while you can use replace.

Example:

const foo = 'a a a';
foo.replace(/(a)/g, (...regex) => {
  // Do something forEach regex found
  console.log(regex);
});

Upvotes: 0

Etienne Martin
Etienne Martin

Reputation: 11619

Encapsulated into a utility function:

const regexExecAll = (str: string, regex: RegExp) => {
  let lastMatch: RegExpExecArray | null;
  const matches: RegExpExecArray[] = [];

  while ((lastMatch = regex.exec(str))) {
    matches.push(lastMatch);

    if (!regex.global) break;
  }

  return matches;
};

Usage:

const matches = regexExecAll("a a a", /(a)/g);

console.log(matches);

Output:

[
  [ 'a', 'a', index: 0, input: 'a a a', groups: undefined ],
  [ 'a', 'a', index: 2, input: 'a a a', groups: undefined ],
  [ 'a', 'a', index: 4, input: 'a a a', groups: undefined ]
]

Upvotes: 0

snnsnn
snnsnn

Reputation: 13698

There are several answers already but unnecessarily complicated. The identity check on the result is excessive because it is always either an array or null.

let text = `How much wood would a woodchuck chuck if a woodchuck could chuck wood?`;
let re = /wood/g;
let lastMatch;

while (lastMatch = re.exec(text)) {
  console.log(lastMatch);
  console.log(re.lastIndex);

  // Avoid infinite loop
  if(!re.global) break;
}

You can move the infinite loop guard into the conditional expression.

while (re.global && (lastMatch = re.exec(text))) {
 console.log(lastMatch);
 console.log(re.lastIndex);
}

Upvotes: 2

dx_over_dt
dx_over_dt

Reputation: 14328

For your example, .match() is your best option. However, if you do need subgroups, you can make a generator function.

function* execAll(str, regex) {
  if (!regex.global) {
    console.error('RegExp must have the global flag to retrieve multiple results.');
  }

  let match;
  while (match = regex.exec(str)) {
    yield match;
  }
}

const matches = execAll('a abbbbb no match ab', /\b(a)(b+)?\b/g);
for (const match of matches) {
  console.log(JSON.stringify(match));
  let otherProps = {};
  for (const [key, value] of Object.entries(match)) {
    if (isNaN(Number(key))) {
      otherProps[key] = value;
    }
  }
  
  console.log(otherProps);
}

While most JS programmers consider polluting a prototype to be bad practice, you could also add this to RegExp.prototype.

if (RegExp.prototype.hasOwnProperty('execAll')) {
  console.error('RegExp prototype already includes a value for execAll.  Not overwriting it.');
} else {
  RegExp.prototype.execAll = 
    RegExp.prototype = function* execAll(str) {
      if (!this.global) {
        console.error('RegExp must have the global flag to retrieve multiple results.');
      }

      let match;
      while (match = this.exec(str)) {
        yield match;
      }
    };
}

const matches = /\b(a)(b+)?\b/g.execAll('a abbbbb no match ab');
console.log(Array.from(matches));

Upvotes: 1

Taki
Taki

Reputation: 17654

regexp.exec(str) returns the first match or the entire match and the first capture (when re = /(a)/g; ) as mentionned in other answers

const str = 'a a a a a a a a a a a a a';
const re = /a/g;

const result = re.exec(str);
console.log(result);

But it also remembers the position after it in regexp.lastIndex property.

The next call starts to search from regexp.lastIndex and returns the next match.

If there are no more matches then regexp.exec returns null and regexp.lastIndex is set to 0.

const str = 'a a a';
const re = /a/g;

const a = re.exec(str);
console.log('match : ', a, ' found at : ', re.lastIndex);

const b = re.exec(str);
console.log('match : ', b, ' found at : ', re.lastIndex);

const c = re.exec(str);
console.log('match : ', c, ' found at : ', re.lastIndex);

const d = re.exec(str);
console.log('match : ', d, ' found at : ', re.lastIndex);

const e = re.exec(str);
console.log('match : ', e, ' found at : ', re.lastIndex);

That's why you can use a while loop that will stop when the match is null

const str = 'a a a';
const re = /a/g;

while(match = re.exec(str)){
  console.log(match, ' found at : ', match.index); 
}

Upvotes: 2

Ωmega
Ωmega

Reputation: 43683

Code:

alert('a a a'.match(/(a)/g).length);

Output:

3

Upvotes: 5

Jeanne Boyarsky
Jeanne Boyarsky

Reputation: 12266

You are only matching the first a. The reason the length is two is that it is finding the first match and the parenthesized group part of the first match. In your case they are the same.

Consider this example.

var a = /b(a)/g.exec('ba ba ba ');
alert(a);

It outputs ba, a. The array length is still 2, but it is more obvious what is going on. "ba" is the full match. a is the parenthesized first grouping match.

The MDN documentation supports this - that only the first match and contained groups are returned. To find all matches, you'd use match() as stated by mVChr.

Upvotes: 8

Andrew Cheong
Andrew Cheong

Reputation: 30293

exec() is returning only the set of captures for the first match, not the set of matches as you expect. So what you're really seeing is $0 (the entire match, "a") and $1 (the first capture)--i.e. an array of length 2. exec() meanwhile is designed so that you can call it again to get the captures for the next match. From MDN:

If your regular expression uses the "g" flag, you can use the exec method multiple times to find successive matches in the same string. When you do so, the search starts at the substring of str specified by the regular expression's lastIndex property (test will also advance the lastIndex property).

Upvotes: 44

mVChr
mVChr

Reputation: 50205

You could use match instead:

'a a a'.match(/(a)/g).length  // outputs: 3

Upvotes: 35

Related Questions