Benjamin Gruenbaum
Benjamin Gruenbaum

Reputation: 276406

What ECMAScript implementations extend the RegExp syntax?

So I know in JavaScript an implementation is allowed to extend the regular expressions grammar:

An implementation may extend the ECMAScript Regular Expression grammar defined in 21.2.1, but it must not extend the RegularExpressionBody and RegularExpressionFlags productions defined below or the productions used by these productions.

Was this ability ever used? Do any existing JavaScript implementations extend the regular expressions grammar?

Upvotes: 19

Views: 524

Answers (2)

nhahtdh
nhahtdh

Reputation: 56819

Octal escape sequence in RegExp

A wide application of that clause (which also presents in ECMAScript 5.1 specification Section 7.8.5) is to provide octal escape sequence to RegExp constructor.

/a\1b/.test("a\u0001b");
/a\11b/.test("a\tb");

The default grammar of RegExp (as described in Section 15.10.1 of ES5.1, or Section 21.2.1 of ES6) doesn't support octal escape sequence, and any decimal escape sequence whose value larger than the number of capturing groups triggers SyntaxError. However, many browsers (even old versions) extends the RegExp grammar to support octal escape sequence and evaluates the 2 lines of code above to true.

Starting from ES6, Annex B, which used to be an informative annex in ES3 to ES5.1 specs, is turned into a normative annex, which requires web browsers to support octal escape sequence for compatibility reasons (non-web-browser hosts can choose to stick to the default implementation).

While previous versions of ECMAScript did address support for octal escape sequence, it was only for Numeric and String literals. Backward-compatible RegExp is describes for the first time in Section B.1.4 of ES6, which changes the semantics and syntax of RegExp for BMP patterns to include support for octal escape sequence, among other features.

Unmatched closing brackets ] and non-range-quantifier with {}

Another common extension (as tested on Firefox 38, Chrome 43 and IE9) is to relax the grammar to allow unmatched closing brackets ] and sequences that don't constitute a numbered quantifier and interpret them as literal strings.

/^][[]]$/.test("][]"); // Tokens: ^  ]  [[]  ]  $
/^{56, 67}$/.test("{56, 67}"); // Extra space

Similar to octal escape sequence, the default grammar of RegExp (section 15.10.1 of ES5.1, or section 21.2.1 of ES6) doesn't allow {, }, ] to be an Atom, as those character are excluded from the production of PatternCharacter.

The grammar in Annex B section B.1.4 of ES6 is also extended to interpret non-range-quantifier sequences (sequences which don't match the grammar of QuantifierPrefix) as literal string, via Atom[U] :: PatternCharacter production.

However, the extended grammar doesn't allow unmatched closing ], as both PatternCharacter and PatternCharacterNoBrace production still disallow ].

Upvotes: 7

Bergi
Bergi

Reputation: 664970

Yes, Mozilla's Gecko engine did support the sticky y flag, which was not part of ES5. It did eventually became part of ES6.

This ability may be utilised again when engines start implementing look-behind (I hope they start experimenting before it will get specced).

This is not an exhaustive list, just what first came to my mind. There may be other examples.

Upvotes: 9

Related Questions