user2799546
user2799546

Reputation: 21

php regex - remove only newlines which do not have space before or after

I need to cleanup texts which have invalid newlines added even within words, and valid newlines which are in between words so there is a leading or trainling space.

with php a try to remove those newlines from a multiline text which is enclosed by characters, meaning having not a space before or after.

$textbefore = "text has newlines in wo\nrds and normal newlines \n bewtween words and again in wo\nrds";
$textafter = "text has newlines in words and normal newlines \n bewtween words and again in words";

tried this

$pattern="/(.{2}\n.{1})/m";

I have tried all possible patterns but in best cases only the first occurence is matched.

Any ideas are highly appreciated.

Upvotes: 1

Views: 337

Answers (2)

Teneff
Teneff

Reputation: 32148

You can use negative lookahead and negative lookbehind:

/(?<!\s)\n(?!\s)/

it will match new line that does not have space before and after

Live Demo

Upvotes: 0

nickb
nickb

Reputation: 59699

You can simplify this into the following regex:

$textafter = preg_replace( "/(?<=\S)\n|\n(?=\S)/", '', $textbefore);

Which states that it must find:

  1. (?<=\S)\n - A newline that is preceded by a character that is not whitespace, OR
  2. \n(?=\S) - A newline that is followed by a character that is not whitespace

When it finds either of these newlines, it replaces them with nothing (an empty string).

You can see from this demo that this produces the string:

string(82) "text has newlines in words and normal newlines 
 bewtween words and again in words"

Upvotes: 2

Related Questions