tuomassalo
tuomassalo

Reputation: 9121

Split string in javascript by lines, preserving newlines?

How would I split a javascript string such as foo\nbar\nbaz to an array of lines, while preserving the newlines? I'd like to get ['foo\n', 'bar\n', 'baz'] as output;

I'm aware there are numerous possible answers - I'm just curious to find a stylish one.

With perl I'd use a zero-width lookbehind assertion: split /(?<=\n)/, but they are not supported in javascript regexs.

PS. Extra points for handling different line endings (at least \r\n) and handling the missing last newline (as in my example).

Upvotes: 4

Views: 3802

Answers (5)

Sam Watkins
Sam Watkins

Reputation: 8379

The other answers and answers in comments are all flawed in different ways. I needed a function that works correctly on any string or file.

Here is a simple and correct answer:

function split_lines(s) {
    return s.match(/[^\n]*\n|[^\n]+/g);
}

input = "foo\r\n\nbar\n\r\nba\rz\r\r\r";

a = split_lines(input);

Array(5) [ "foo\r\n", "\n", "bar\n", "\r\n", "ba\rz\r\r\r" ]

It effectively splits at each newline \n but includes the \n, and includes a final line without trailing \n if and only if it is not empty. It includes all input characters in the output. We don't need any special treatment for \r.

I've tested this on a large chunk of random data, it does preserve all input characters, and \n only occur at the end of the lines.

Here's a test script:

function split_lines(s) {
    return s.match(/[^\n]*\n|[^\n]+/g);
}

function gen_random_string(n, ncharset=256, nlprob=0.05, crprob=0.05) {
    var s = "";
    for (let i = 0; i < n; ++i) {
        var r = Math.random();
        if (r < nlprob)
            s += "\n";
        else if (r < nlprob + crprob)
            s += "\r";
        else {
            var cc = Math.floor(r / (1 - nlprob - crprob) * ncharset);
            var c = String.fromCharCode(cc);
            s += c;
        }
    }
    return s;
}

function test(...args) {
    var s = gen_random_string(...args);
    console.log(`generated random string of length ${s.length} with args:`, ...args);

    var ok = true, ok1;
    var a = split_lines(s);
    console.log(`split into ${a.length} lines`);

    ok1 = s === a.join('');
    ok = ok && ok1;
    console.log("split lines combine to give the original string?", ok1 ? "OK" : "FAIL");
    for (var i = 0; i < a.length; ++i) {
        var s1 = a[i];
        ok1 = s1.endsWith("\n") || i == a.length-1;
        ok = ok && ok1;
        ok1 = !s1.slice(0, -1).includes("\n");
        ok = ok && ok1;
    }
    console.log("tested each line other than the last ends with \\n");
    console.log("tested each line does not contain \\n before the last character");
    console.log("Final result", ok ? "OK" : "FAIL");
}

test(10000, 256);
test(10000, 65536);

Upvotes: 3

Hemlock
Hemlock

Reputation: 6208

I'd stay away from split with regular expressions since IE has a failed implementation of it. Use match instead.

"foo\nbar\nbaz".match(/^.*(\r?\n|$)/mg)

Result: ["foo\n", "bar\n", "baz"]

Upvotes: 1

Ahmad Mageed
Ahmad Mageed

Reputation: 96557

You can perform a global match with this pattern: /[^\n]+(?:\r?\n|$)/g

It matches any non-newline character then matches an optional \r followed by \n, or the end of the string.

var input = "foo\r\n\nbar\nbaz";
var result = input.match(/[^\n]+(?:\r?\n|$)/g);

Result: ["foo\r\n", "bar\n", "baz"]

Upvotes: 3

Sam Greenhalgh
Sam Greenhalgh

Reputation: 6136

how about this?

"foo\nbar\nbaz".split(/^/m);

Result

["foo
", "bar
", "baz"]

Upvotes: 3

caleb
caleb

Reputation: 1637

One simple but crude method would be first to replace "\n"s with a 2 special characters. Split on the second one, and replace the first with "\n" after splitting. Not efficient and not elegant, but definitely works.

Upvotes: 1

Related Questions