Reputation: 83253
I am seeing weird behavior when splitting a string with a lookahead with a capture group.
I sometimes get more characters than in the original string. I would not think that possible.
Javascript JsFiddle
'ab'.split(/(?=b)/).join('');
'ab'.split(/(?=(?:b))/).join('');
'ab'.split(/(?=(b))/).join('');
'ab'
'ab'
'abb'
Other lanaguages:
Java/Scala
"ab".split("(?=b)").mkString
"ab".split("(?=(?:b))").mkString
"ab".split("(?=(b))").mkString
"ab"
"ab"
"ab"
PHP
implode(preg_split('/(?=b)/', 'ab'));
implode(preg_split('/(?=(?:b))/', 'ab'));
implode(preg_split('/(?=(b))/', 'ab'));
'ab'
'ab'
'ab'
Why does Javascript wind up with more characters than the original string for the third regex? I have reproduced this with Chrome, Firefox, Opera, and IE 11.
EDIT:
It appears Ruby does the same thing:
'ab'.split(%r{(?=b)}).join
'ab'.split(%r{(?=(?:b))}).join
'ab'.split(%r{(?=(b))}).join
'ab'
'ab'
'abb'
Upvotes: 3
Views: 81
Reputation: 149020
From MDN:
If
separator
is a regular expression that contains capturing parentheses, then each timeseparator
is matched, the results (including any undefined results) of the capturing parentheses are spliced into the output array.
So when you have a capturing group, even in a lookahead, like this:
'ab'.split(/(?=(b))/)
The result will include a
and b
, the two portions of the string before and after the position which matched the lookahead, but it will also include the portion of the string which matched the group inside the lookahead, b
.
However, the MDN article goes on to point out:
However, not all browsers support this capability.
So I wouldn't necessarily expect this behavior to be consistent across all browsers.
Upvotes: 4
Reputation: 89557
It is because javascript put automatically capture group in split result.
Upvotes: 0