gandalf3
gandalf3

Reputation: 1666

Split string into array without deleting delimiter?

I have a string like

 "asdf a  b c2 "

And I want to split it into an array like this:

["asdf", " ", "a", " ", " ", "b", " ", "c2", " "]

Using string.split(" ") removes the spaces, resulting in this:

["asdf", "a", "", "b", "c2"]

I thought of inserting extra delimiters, e.g.

string.replace(/ /g, "| |").replace(/||/g, "|").split("|");

But this gives an unexpected result.

Upvotes: 25

Views: 7522

Answers (5)

Richie Bendall
Richie Bendall

Reputation: 9172

Try clean-split:

const cleanSplit = require("clean-split");

cleanSplit("a-b-c", "-");
//=> ["a", "-", "b", "-", "c"]

cleanSplit("a-b-c", "-", { anchor: "before" });
//=> ["a-", "b-", "c"]

cleanSplit("a-b-c", "-", { anchor: "after" });
//=> ["a", "-b", "-c"]

Under the hood, it uses logic adapted from:

In your case, you can do something like this:

const cleanSplit = require("clean-split");

cleanSplit("asdf a  b c2 ", " ");
//=> ["asdf", " ", "a", " ", " ", "b", " ", "c2", " "]

Upvotes: 0

Ja͢ck
Ja͢ck

Reputation: 173552

Instead of splitting, it might be easier to think of this as extracting strings comprising either the delimiter or consecutive characters that are not the delimiter:

'asdf a  b c2 '.match(/\S+|\s/g)
// result: ["asdf", " ", "a", " ", " ", "b", " ", "c2", " "]
'asdf a  b. . c2% * '.match(/\S+|\s/g)
// result: ["asdf", " ", "a", " ", " ", "b.", " ", ".", " ", "c2%", " ", "*", " "]

A more Shakespearean definition of the matches would be:

'asdf a  b c2 '.match(/ |[^ ]+/g)

To or (not to )+.

Upvotes: 23

p.s.w.g
p.s.w.g

Reputation: 149010

I'm surprised no one has mentioned this yet, but I'll post this here for the sake of completeness. If you have capturing groups in your expression, then .split will include the captured substring as a separate entry in the result array:

"asdf a  b c2 ".split(/( )/)  // or /(\s)/
// ["asdf", " ", "a", " ", "", " ", "b", " ", "c2", " ", ""]

Note, this is not exactly the same as the desired output you specified, as it includes an empty string between the two contiguous spaces and after the last space.

If necessary, you can filter out all empty strings from the result array like this:

"asdf a  b c2 ".split(/( )/).filter(String)
// ["asdf", " ", "a", " ", " ", "b", " ", "c2", " "]

However, if this is what you're looking for, I'd probably recommend you go with @Jack's solution.

Upvotes: 8

Amadan
Amadan

Reputation: 198324

Use positive lookahead:

"asdf a  b c2 ".split(/(?= )/)
// => ["asdf", " a", " ", " b", " c2", " "]

Post-edit EDIT: As I said in comments, the lack of lookbehind makes this a bit trickier. If all the words only consist of letters, you can fake lookbehind using \b word boundary matcher:

"asdf a  b c2 ".split(/(?= )|\b/)
// => ["asdf", " ", "a", " ", " ", "b", " ", "c2", " "]

but as soon as you get some punctuation in, it breaks down, since it does not only break on spaces:

"asdf-eif.b".split(/(?= )|\b/)
// => ["asdf", "-", "eif", ".", "b"]

If you do have non-letters you don't want to break on, then I will also suggest a postprocessing method.

Post-think EDIT: This is based on JamesA's original idea, but refined to not use jQuery, and to correctly split:

function chop(str) {
  var result = [];
  var pastFirst = false;
  str.split(' ').forEach(function(x) {
    if (pastFirst) result.push(' ');
    if (x.length) result.push(x);
    pastFirst = true;
  });
  return result;
}
chop("asdf a  b c2 ")
// => ["asdf", " ", "a", " ", " ", "b", " ", "c2", " "]

Upvotes: 10

JamesA
JamesA

Reputation: 375

You could use a little jQuery

var toSplit = "asdf a  b c2 ".split(" ");
$.each(toSplit, 
    function(index, value) { 
        if (toSplit[index] == '') { toSplit[index] = ' '} 
    }
);

This will create the output you are looking for without the leading spaces on the other elements.

Upvotes: 0

Related Questions