JS RegEx replacement of a non-captured group?

Question

I'm currently going through the book "Eloquent JavaScript". There's an exercice at the end of Chapter 9 on Regular Expressions that I couldn't understand its solution very well. Description of the exercice can be found here.

TL;DR : The objective is to replace single quotes (') with double quotes (") in a given string while keeping single quotes in contractions. Using the replace methode with a RegEx of course.

Now, after actually resolving this exercice using my own method, I checked the proposed solution which looks like this :

console.log(text.replace(/(^|\W)'|'(\W|$)/g, '$1"$2'));

The RegEx looks fine and it's quite understandable, but what I fail to understand is the usage of replacements, mainly why using $2 works ? As far as I know this regular expression will only take one path of two, either (^|\W)' or '(\W|$) each of these paths will only result in a single captured group, so we will only have $1 available. And yet $2 is capturing what comes after the single quote without having an explicit second capture group that does this in the regular expression. One can argue that there are two groups, but then again $2 is capturing a different string than the one intended by the second group.

My questions :

Why $2 is actually a valid string and is not undefined, and what is it referring to precisely?
Is this one of JavaScript RegEx quirks ?
Does this mean $1, $2... don't always refer to explicit groups ?

Wiktor Stribiżew · Accepted Answer

The backreferences are initialized with an empty string upon each match, so there will be no issues if a group is not matched. And it is no quirk, it is in compliance with the ES5 standard.

Here is a quote from Backreferences to Failed Groups:

According to the official ECMA standard, a backreference to a non-participating capturing group must successfully match nothing just a backreference to a participating group that captured nothing does.

So, once a backreference is not participating in the match, it refers to an empty string, not undefined. And it is not a quirk, just a "feature". That is not quite expected sometimes, but it is just how it works.

In your scenario, either of the backreferences is empty upon a match since there are two alternative branches and only one matches each time. The point is to restore the char matched in either of the groups. Both backreferences are used as either of them contains the text to restore while the other only contains empty text.

JS RegEx replacement of a non-captured group?

Answers (1)

Related Questions