Phillip M. Feldman
Phillip M. Feldman

Reputation: 546

javascript regex split produces too many items

I'm trying to split a string using either commas or whitespace. A comma can optionally be preceded and/or followed by whitespace, and whitespace by itself also counts as a delimiter. The code looks like this:

var answers= s.split(/(\s*,\s*)|\s+/);

If s contains the string 'a b,c', I get a list (array) containing five items instead of the expected three:

0:a, 1:undefined, 2:b, 3:,, 4:c

Any advice as to what I'm doing wrong will be appreciated.

Phillip

Upvotes: 5

Views: 1236

Answers (4)

zany
zany

Reputation: 981

With regexes the capture expression (x) remembers the match (and possibly returns that to the String.split). You should use the (non-capturing) grouping expression (?:x). See e.g. the Mozilla Docs on RegExp for more.

Upvotes: 0

Bergi
Bergi

Reputation: 664599

That's because split does also push capturing groups to the result array:

If separator is a regular expression that contains capturing parentheses, then each time separator is matched the results (including any undefined results) of the capturing parentheses are spliced into the output array.

The space between a and b was matched by the whitespace, so the capturing group was undefined. The comma between b and c was matched by the group, so it became the fourth item of your array.

To solve the issue, just remove the capturing group:

var answers = s.split(/\s*,\s*|\s+/);

If you had a more complex expression where you needed grouping, you could make it non-capturing like this:

var answers = s.split(/(?:\s*,\s*)|\s+/);

Upvotes: 14

p.s.w.g
p.s.w.g

Reputation: 149020

If you simply remove the parentheses, it will work:

var s = 'a,b,c'
var answers = s.split(/\s*,\s*|\s+/);
// [ 'a', 'b', 'c' ]

Upvotes: 2

Felix Kling
Felix Kling

Reputation: 816472

The content of capturing groups are added to the result array. From the MDN documentation:

If separator is a regular expression that contains capturing parentheses, then each time separator is matched the results (including any undefined results) of the capturing parentheses are spliced into the output array. However, not all browsers support this capability.

Use non-capturing groups:

/(?:\s*,\s*)|\s+/

Upvotes: 4

Related Questions