Reputation: 9121
How would I split a javascript string such as foo\nbar\nbaz
to an array of lines, while preserving the newlines? I'd like to get ['foo\n', 'bar\n', 'baz']
as output;
I'm aware there are numerous possible answers - I'm just curious to find a stylish one.
With perl I'd use a zero-width lookbehind assertion: split /(?<=\n)/
, but they are not supported in javascript regexs.
PS. Extra points for handling different line endings (at least \r\n
) and handling the missing last newline (as in my example).
Upvotes: 4
Views: 3802
Reputation: 8379
The other answers and answers in comments are all flawed in different ways. I needed a function that works correctly on any string or file.
Here is a simple and correct answer:
function split_lines(s) {
return s.match(/[^\n]*\n|[^\n]+/g);
}
input = "foo\r\n\nbar\n\r\nba\rz\r\r\r";
a = split_lines(input);
Array(5) [ "foo\r\n", "\n", "bar\n", "\r\n", "ba\rz\r\r\r" ]
It effectively splits at each newline \n
but includes the \n
, and includes a final line without trailing \n
if and only if it is not empty. It includes all input characters in the output. We don't need any special treatment for \r
.
I've tested this on a large chunk of random data, it does preserve all input characters, and \n
only occur at the end of the lines.
Here's a test script:
function split_lines(s) {
return s.match(/[^\n]*\n|[^\n]+/g);
}
function gen_random_string(n, ncharset=256, nlprob=0.05, crprob=0.05) {
var s = "";
for (let i = 0; i < n; ++i) {
var r = Math.random();
if (r < nlprob)
s += "\n";
else if (r < nlprob + crprob)
s += "\r";
else {
var cc = Math.floor(r / (1 - nlprob - crprob) * ncharset);
var c = String.fromCharCode(cc);
s += c;
}
}
return s;
}
function test(...args) {
var s = gen_random_string(...args);
console.log(`generated random string of length ${s.length} with args:`, ...args);
var ok = true, ok1;
var a = split_lines(s);
console.log(`split into ${a.length} lines`);
ok1 = s === a.join('');
ok = ok && ok1;
console.log("split lines combine to give the original string?", ok1 ? "OK" : "FAIL");
for (var i = 0; i < a.length; ++i) {
var s1 = a[i];
ok1 = s1.endsWith("\n") || i == a.length-1;
ok = ok && ok1;
ok1 = !s1.slice(0, -1).includes("\n");
ok = ok && ok1;
}
console.log("tested each line other than the last ends with \\n");
console.log("tested each line does not contain \\n before the last character");
console.log("Final result", ok ? "OK" : "FAIL");
}
test(10000, 256);
test(10000, 65536);
Upvotes: 3
Reputation: 6208
I'd stay away from split
with regular expressions since IE has a failed implementation of it. Use match
instead.
"foo\nbar\nbaz".match(/^.*(\r?\n|$)/mg)
Result: ["foo\n", "bar\n", "baz"]
Upvotes: 1
Reputation: 96557
You can perform a global match with this pattern: /[^\n]+(?:\r?\n|$)/g
It matches any non-newline character then matches an optional \r
followed by \n
, or the end of the string.
var input = "foo\r\n\nbar\nbaz";
var result = input.match(/[^\n]+(?:\r?\n|$)/g);
Result: ["foo\r\n", "bar\n", "baz"]
Upvotes: 3
Reputation: 6136
how about this?
"foo\nbar\nbaz".split(/^/m);
Result
["foo
", "bar
", "baz"]
Upvotes: 3
Reputation: 1637
One simple but crude method would be first to replace "\n"s with a 2 special characters. Split on the second one, and replace the first with "\n" after splitting. Not efficient and not elegant, but definitely works.
Upvotes: 1