CantV
CantV

Reputation: 79

String will not split using Regex

As seen in the snippet, I have a Regex that should identify any empty lines. (I'm aware I can just do /n/n, but it doesn't suit my purposes). I've tested it in a word editor, and it picks up every new line when using the find tool. But in JS, I'm still getting the entire file. What am I missing, here?

const mockData = `This is some fake data
with multiple sentences
  
and line breaks`;

const newArr = mockData.split(/^\s*$/);

console.log(newArr[0]);

Upvotes: 0

Views: 47

Answers (2)

VLAZ
VLAZ

Reputation: 29115

You have a multiline string but aren't using the m (multiline) flag. Without it ^ and $ match the start/end of the entire string, so you'd only split if the entirety of the string was composed of whitespace:

//multiline - all whitespace
const mockData = `
 
`;

const newArr = mockData.split(/^\s*$/);

console.log(newArr);

Using the m flag, the ^ and $ characters instead match start/end of each line. So now the regex works to split on lines that are either empty or composed of newline characters:

const mockData = `This is some fake data
with multiple sentences
  
and line breaks`;

const newArr = mockData.split(/^\s*$/m);

console.log(newArr);

If you intend to split at newlines and empty lines leaving no blanks, then you can eschew the ^ and $ characters entirely, since they are actually more trouble. The engine might do a split before a newline because that's the end of the line $. So, instead of trying to get around that with more regex, just split on whitespace + a newline or newline + whitespace.

const mockData = `This is some fake data
with multiple sentences
  
and line breaks`;

const newArr = mockData.split(/\s*[\r\n]+|[\r\n]+\s*/);

console.log(newArr);

With this you don't need to use the multiline flag, since you never use the behaviour it introduces.

Also, I should note that [\r\n]+ is a slight cheat on my part. The end of line characters are either \r\n or just \n, you very likely will never encounter a simple \r. However, the proper regex then is \r?\n which I find ugly, especially if you try to repeat it - (\r?\n)+. A character class is ever so slightly inaccurate yet in a way that should never have any effect on accuracy.

Upvotes: 1

mplungjan
mplungjan

Reputation: 178422

Using the multiline flag works better

const newArr = mockData.split(/\s*$/m);

Take your pick

const re1 = /^\s*|\s*$/m
const re2 = /^\s*$/m
const re3 = /\s*$/m

const mockData = `This is some fake data
with multiple sentences

and line breaks`;

const newArr1 = mockData.split(re1);
console.log(JSON.stringify(newArr1))
const newArr2 = mockData.split(re2);
console.log(JSON.stringify(newArr2))
const newArr3 = mockData.split(re3);
console.log(JSON.stringify(newArr3))

Upvotes: 0

Related Questions