aland
aland

Reputation: 2004

Remove newlines unless preceded by

I need to remove extra new-lines from some text. I only need to keep the new-lines that are immediately preceded by a full stop / period character ..

In the example text below, I only need to keep 2 new-lines: one after ...arcu rhoncus. and one after ...ac in est..

Donec viverra mi quis quam pulvinar at malesuada arcu rhoncus.
Cum sociis natoque penatibus et magnis dis parturient montes, nascetur
ridiculus mus. In rutrum accumsan ultricies. Mauris vitae nisi at sem facilisis
semper ac in est.
Vivamus fermentum semper porta. Nunc diam velit, adipiscing ut tristique
vitae, sagittis vel odio. Maecenas convallis ullamcorper ultricies. Curabitur
ornare, ligula semper consectetur sagittis, nisi diam iaculis velit, id 
fringilla sem nunc vel mi.

I am using notepad++ for this.

I can match what I want to keep with the below, but I am not sure how I can make the whole solution.

[.]$

Upvotes: 2

Views: 1280

Answers (3)

speakr
speakr

Reputation: 4209

Like suggested in this comment a negative look-behind works well. Search for this regexp in Notepad++ and replace with a single space:

(?<!\.)\s*\r\n\s*

If you only have have \n instead of \r\n then just remove the \r.

Note that when using \r? Notepad++ seems to match non-greedy so the \r won't be removed.

Result with Notepad++ v6.1.5 (UNICODE):

Donec viverra mi quis quam pulvinar at malesuada arcu rhoncus.
Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. In rutrum accumsan ultricies. Mauris vitae nisi at sem facilisis semper ac in est.
Vivamus fermentum semper porta. Nunc diam velit, adipiscing ut tristique vitae, sagittis vel odio. Maecenas convallis ullamcorper ultricies. Curabitur ornare, ligula semper consectetur sagittis, nisi diam iaculis velit, id fringilla sem nunc vel mi.

Upvotes: 3

user1919238
user1919238

Reputation:

Here is a non-look-behind method:

Search for:

([^.])(\r\n)+

And replace with:

\1 

Where \1 is followed by a space.

Note the + is needed to match multiple newlines in a row. Otherwise, not all newlines would match.

Upvotes: 2

AdamL
AdamL

Reputation: 13141

You need to use negative lookbehind and replace with space:

(?<!\.)\r\n

Another option for someone that doesn't know this construct (or if lookbehind is not supported), would be to first replace all occurences of \.\r\n with something distinct like <rnt> string, then remove all newlines, and then replace <rnt> with \.\r\n again.

Upvotes: 2

Related Questions