tegandbiscuits
tegandbiscuits

Reputation: 81

Javascript regex, make remove single paragraph line breaks

I've got text in this format:

word word,
word word.

word word
word word.

Not specific to that two word format, it's just a line break before so many characters, rather than one long string of paragraph. But I'm trying to get it to be that one long string of paragraph. So it should look like this:

word word, word word.
word word word word.

If I use the code text.replace(/$\n(?=.)/gm, " ") and output that to the terminal I get text that looks like:

 word word, word word.
 word word word word.

It's got an extra space at the start of the paragraph, but that's good enough for what I'm trying to do (although if there's also a way to remove it in one replace function than that's good). The problem is that when I output it to a textarea it doesn't remove the \n character, and I just get text that looks like this:

 word word,
 word word.

 word word
 word word.

I'm trying to do this all client side, currently running it in Firefox.

I'm not the best with regex, so this might be really simple and I'm just ignorant on how to do it. But any help would be really appreciated. Thanks!

Upvotes: 1

Views: 787

Answers (3)

Shanoor
Shanoor

Reputation: 13692

You probably missed some \r, here's a way to match all sort of new lines and not have extra spaces:

var input = 'word word,\nword word.\n\nword word\nword word.';

            // split if 2 or more new lines
var out = input.split(/(\r\n|\n|\r){2,}?/)
            // split the paragraph by new lines and join the lines by a space
            .map((v) => v.split(/\r\n|\n|\r/).join(' '))
            // there is some spaces hanging in the array, filter them
            .filter((v) => v.trim())
            // join together all paragraphs by \n
            .join('\n');

$('#txt').append(out);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>

<textarea id="txt"></textarea>

Upvotes: 1

Giuseppe Ricupero
Giuseppe Ricupero

Reputation: 6272

Below a snippet of code that satisfy your request, i've removed the leading whitespaces too (caused by empty lines), using a closure with the replace function:

var regex  = /([^.])\s+/g;

var input  = 'word word,\nword word.\n\nword word\nword word.';

var result = input.replace(regex, function(all, char) {
  return (char.match(/\s/)) ? char : char + ' ' ;
});

document.write('<b>INPUT</b> <xmp>' + input + '</xmp>');
document.write('<b>OUTPUT</b> <xmp>' + result + '</xmp>');

Regex Breakout

([^.])        # Select any char that is not a literal dot '.'
              # and save it in group $1
\s+           # 1 or more whitespace char, remove trailing spaces (tabs too)
              # and all type of newlines (\r\n, \r, \n)

NOTE

if for some reason you want to keep the leading whitespace, simplify the code below as follow:

var regex   = /([^.])\s+/g;
var replace = '$1 ';

var input   = 'word word,\nword word.\n\nword word\nword word.';

var result = input.replace(regex, replace);

document.write('<b>INPUT</b> <xmp>' + input + '</xmp>');
document.write('<b>OUTPUT</b> <xmp>' + result + '</xmp>');

Upvotes: 1

kurt
kurt

Reputation: 1156

A carriage return is \r so you would need to use

text.replace(/$(\r|\n)(?=.)/gm, " ");

Upvotes: 1

Related Questions