Paul Draper
Paul Draper

Reputation: 83253

.split().join() returning more characters than originally present

I am seeing weird behavior when splitting a string with a lookahead with a capture group.

I sometimes get more characters than in the original string. I would not think that possible.

Javascript JsFiddle

'ab'.split(/(?=b)/).join('');
'ab'.split(/(?=(?:b))/).join('');
'ab'.split(/(?=(b))/).join('');

'ab'
'ab'
'abb'

Other lanaguages:

Java/Scala

"ab".split("(?=b)").mkString
"ab".split("(?=(?:b))").mkString
"ab".split("(?=(b))").mkString

"ab"
"ab"
"ab"

PHP

implode(preg_split('/(?=b)/', 'ab'));
implode(preg_split('/(?=(?:b))/', 'ab'));
implode(preg_split('/(?=(b))/', 'ab'));

'ab'
'ab'
'ab'

Why does Javascript wind up with more characters than the original string for the third regex? I have reproduced this with Chrome, Firefox, Opera, and IE 11.


EDIT:

It appears Ruby does the same thing:

'ab'.split(%r{(?=b)}).join
'ab'.split(%r{(?=(?:b))}).join
'ab'.split(%r{(?=(b))}).join

'ab'
'ab'
'abb'

Upvotes: 3

Views: 81

Answers (2)

p.s.w.g
p.s.w.g

Reputation: 149020

From MDN:

If separator is a regular expression that contains capturing parentheses, then each time separator is matched, the results (including any undefined results) of the capturing parentheses are spliced into the output array.

So when you have a capturing group, even in a lookahead, like this:

'ab'.split(/(?=(b))/)

The result will include a and b, the two portions of the string before and after the position which matched the lookahead, but it will also include the portion of the string which matched the group inside the lookahead, b.

However, the MDN article goes on to point out:

However, not all browsers support this capability.

So I wouldn't necessarily expect this behavior to be consistent across all browsers.

Upvotes: 4

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89557

It is because javascript put automatically capture group in split result.

Upvotes: 0

Related Questions