Reputation: 4817
According to ECMA-262 §21.1.3.19 String.prototype.split,
String.prototype.split ( separator, limit )
Returns an Array object into which substrings of the result of converting this object to a String have been stored. The substrings are determined by searching from left to right for occurrences of separator; these occurrences are not part of any substring in the returned array, but serve to divide up the String value.
However, I'm currently observing a strange behavior. Here's the code:
let s = new String("All the world's a stage, And all the men and women merely players;");
console.log(s.split(/( |o)men /));
Expected output:
[
"All the world's a stage, And all the",
'and w',
'merely players;'
]
Actual output:
[
"All the world's a stage, And all the",
' ',
'and w',
'o',
'merely players;'
]
What's happening here? How should I write to match " men " or "omen "?
Environment:
~ $ node --version
v13.8.0
Just for my note:
Python3 behaves the same.
import re
s = "All the world's a stage, And all the men and women merely players;"
print(re.compile("( |o)men ").split(s))
#=> ["All the world's a stage, And all the", ' ', 'and w', 'o', 'merely players;']
print(re.compile("(?: |o)men ").split(s))
#=> ["All the world's a stage, And all the", 'and w', 'merely players;']
Maybe there's a reasonable reason or actual use-cases for this strange (at least, to me) behavior...
Upvotes: 1
Views: 94
Reputation: 147176
The String.prototype.split spec also says (in the same paragraph):
The value of separator may be a String of any length or it may be an object, such as a RegExp, that has a @@split method.
If we look at the spec for RegExp.prototype [ @@split ]
, it says:
If the regular expression contains capturing parentheses, then each time separator is matched the results (including any undefined results) of the capturing parentheses are spliced into the output array.
This explains the behaviour you are seeing. To work around it, just use a non-capturing group i.e.
let s = new String("All the world's a stage, And all the men and women merely players;");
console.log(s.split(/(?: |o)men /));
Or, for better performance, since you're only alternating single characters, use a character class:
let s = new String("All the world's a stage, And all the men and women merely players;");
console.log(s.split(/[ o]men /));
Upvotes: 4
Reputation: 20830
When found, separator is removed from the string and the substrings are returned in an array.
If separator is a regular expression with capturing parentheses, then each time separator matches, the results (including any undefined results) of the capturing parentheses are spliced into the output array.
Upvotes: 2