TheDelta
TheDelta

Reputation: 146

Javascript split text and regex

I am working with firefox under debian, and I don't understand the comportment of javascript:

var testRegex = /yolo .+ .+/gu;
let test = `yolo 2 abc
yolo 2 abc`;

test = test.split('\n');

for (let t=0; t < test.length; t++)
{
    console.log(test[t], testRegex.exec(test[t]));
}

And it send back:

Console result

Something even stranger:

for (let t=0; t < test.length; t++)
{
    console.log(test[t], testRegex.exec(test[t]), test[t].match(testRegex));
}

send back:

Console result

I don't think it could be an encoding problem, or from my code.

What can I do?

Upvotes: 0

Views: 65

Answers (1)

Benjamin Pannell
Benjamin Pannell

Reputation: 4093

This is actually expected behaviour, believe it or not. The exec() method on a JavaScript regex is stateful and intended to be something that one would call within a loop. Each subsequent execution will return the next match within the string until no further matches are found, at which point null will be returned.

To highlight this in your first example, let's quickly simplify the code a bit and show what values are in each variable.

let testRegex = /yolo .+ .+/gu;
let test = [
  "yolo 2 abc",
  "yolo 2 abc"
]

This results in your calls to testRegex.exec looking something like the following:

testRegex.exec("yolo 2 abc") // => Array ["yolo 2 abc"]
testRegex.exec("yolo 2 abc") // => null

You'll find the official documentation for this here where they state:

If your regular expression uses the "g" flag, you can use the exec() method multiple times to find successive matches in the same string. When you do so, the search starts at the substring of str specified by the regular expression's lastIndex property (test() will also advance the lastIndex property). Note that the lastIndex property will not be reset when searching a different string, it will start its search at its existing lastIndex.

The reason why the second example you provide does not run into this issue is that the match() function resets the lastIndex property to 0 internally, resetting the search location and resulting in the second call to exec() searching from the start of the regular expression.

Coming back to your original example, you could modify it as follows and you would see the behaviour you're expecting:

var testRegex = /yolo .+ .+/gu;
let test = `yolo 2 abc
yolo 2 abc`;

test = test.split('\n');

for (let t=0; t < test.length; t++)
{
    console.log(test[t], testRegex.exec(test[t]));
    testRegex.lastIndex = 0;
}

Upvotes: 2

Related Questions