Avatar
Avatar

Reputation: 15176

Merge multiple HTML line breaks into one with PHP? Line-Breaks caused by P and BR tags

First part of question: p tag

I have a string that contains text with unnecessary line breaks caused by p tags, example:

<p>hi everyone,</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>Here comes the content I wanted to write...</p>

I would like to filter these empty p tags and merge them into one:

<p>hi everyone,</p>
<p>&nbsp;</p>
<p>Here comes the content I wanted to write...</p>

How can this be done?

Thank you!


Second part of question: br tag

Sometimes the string contains br tags that are causing line breaks as well, example:

that is all I wanted to write.<br />
<br />
&nbsp;<br />
<br />
&nbsp;<br />
<br />
bye

This should become:

that is all I wanted to write.<br />
<br />
bye

Upvotes: 1

Views: 2311

Answers (1)

Phil Cross
Phil Cross

Reputation: 9302

try using str_replace

$content = str_replace(array("<p>&nbsp;</p>\n", "&nbsp;<br />\n"), array('', ''), $content);

To use regex:

$content = preg_replace('/((<p\s*\/?>\s*)&nbsp;(<\/p\s*\/?>\s*))+/im', "<p>&nbsp;</p>\n", $content);

and for BRs

$content = preg_replace('/(&nbsp;(<br\s*\/?>\s*)|(<br\s*\/?>\s*))+/im', "<br />\n", $content);

EDIT Heres why your regex works (hopefully so you can understand it a bit :) ):

/((\\n\s*))+/im
^  ^^^ ^^  ^^^^
|  \|/ ||  ||\|
|   |  ||  || -- Flags
|   |  ||  |-- Regex End Character
|   |  ||  -- One or more of the preceeding character(s)
|   |  |-- Zero or More of the preceeding character(s)
|   |  -- String Character
|   -- Newline Character (Escaped)
-- Regex Start Character

Every regex expression must start and end with the same character. In this case, i've used the forward slash character.

The ( character indicates an expression block (to replace) The Newline character is \n. Because the backslash is used as the escape character in regex, you will need to escape it: \\n.

The string character is \s. This will search for a string. The * character means to search for 0 or more of the preceeding expression, in this case, search for zero or more strings: \s*.

The + symbols searches for ONE or more of the preceeding expresssion. In this case, the preceeding expression is (\\n\s*), so as long as that expression is found once or more, the preg_replace function will find something.

The flags I've used i and m means case *I*nsensitive, (not really needed for a newline expression), and *M*ultiline - meaning the expression can go over multiple lines of code, instead of the code needing to be on one line.

Upvotes: 3

Related Questions