kyw
kyw

Reputation: 7533

Regex: Match all newlines except first ones

Let's say I have this block of text:

* hello

  
world



hello again

Using RegEx in Javascript, how do I replace all - but except the first one - new lines between paragraphs with \ in all platforms?

So effectively the result would be,

* hello

\
world

\
\
hello again

Upvotes: 3

Views: 144

Answers (4)

trincot
trincot

Reputation: 350127

You could use a look-behind assertion to account for the preceding empty line which should remain untouched. This needs no callback, and makes no assumption about the kind of newline character (\r or \n or combination) -- it just relies on the ^ and $ line-delimiting anchors:

const s = `* hello

  
world



hello again
`;

const res = s.replace(/(?<=^ *$\s+?^) *$/gm, "\\");
console.log(res);

See execution time comparison on JSBench:

When you need support for platforms that do not fully implement ECMAScript 2018, then you could opt to first trim the lines of trailing white space, and then apply the following regex-based replacement:

const s = `* hello
   
  
world
   
  
   
hello again
`;

const res = s.replace(/[ \t]+$/gm, "").replace(/^\s+?$/gm, "$&\\");
console.log(res);

If you are certain your input does not have any lines with only white space, then you can omit the first replace call...

Upvotes: 3

bobble bubble
bobble bubble

Reputation: 18490

You could capture a non-newline (the dot normally) and an optional newline before and use a callback to check if first group is set. If so, return full match, else prepend backslash to newline.

const s = `* hello


world



hello again`;

const res = s.replace(/(.\n?)?\n/g, (m0, m1) => m1 ? m0 : '\\\n');

console.log(res);

  • to prevent a possible match at the beginning, use a lookahead: (.\n?)?(?!^)\n
  • taking lines with horizontal space into account: (^[ \t]*|\S\n?)?[ \t]*(?!^)\n

Upvotes: 6

The fourth bird
The fourth bird

Reputation: 163207

If you can use a lookbehind assertion in JavaScript, you could match a newline and then check that what is directly to the left of the current position is a character (without a newline) followed by 3 or more newlines.

const s = `* hello


world



hello again`;

const regex = /\n(?<=.\n{3,})/gm;

console.log(s.replace(regex, "\\\n"));

Upvotes: 1

Michael M.
Michael M.

Reputation: 11070

Try using .split() to keep everything that shouldn't be converted to a \, then use .map() to convert all the empty space into \s, then recombine the lines:

s = `* hello


world



hello again`;

s = s.split(/.*?(?<=\n)\n/g).map(x => x === '' ? '\\' : x).join('\n');
console.log(s)

I benchmarked this answer against bobble bubble's answer in JSBench and found mine to be slightly faster.

Upvotes: 3

Related Questions