Reputation: 21
I need to cleanup texts which have invalid newlines added even within words, and valid newlines which are in between words so there is a leading or trainling space.
with php a try to remove those newlines from a multiline text which is enclosed by characters, meaning having not a space before or after.
$textbefore = "text has newlines in wo\nrds and normal newlines \n bewtween words and again in wo\nrds";
$textafter = "text has newlines in words and normal newlines \n bewtween words and again in words";
tried this
$pattern="/(.{2}\n.{1})/m";
I have tried all possible patterns but in best cases only the first occurence is matched.
Any ideas are highly appreciated.
Upvotes: 1
Views: 337
Reputation: 32148
You can use negative lookahead and negative lookbehind:
/(?<!\s)\n(?!\s)/
it will match new line that does not have space before and after
Upvotes: 0
Reputation: 59699
You can simplify this into the following regex:
$textafter = preg_replace( "/(?<=\S)\n|\n(?=\S)/", '', $textbefore);
Which states that it must find:
(?<=\S)\n
- A newline that is preceded by a character that is not whitespace, OR\n(?=\S)
- A newline that is followed by a character that is not whitespaceWhen it finds either of these newlines, it replaces them with nothing (an empty string).
You can see from this demo that this produces the string:
string(82) "text has newlines in words and normal newlines
bewtween words and again in words"
Upvotes: 2