Avatar
Avatar

Reputation: 15166

How to replace <p><br></p> from end of string that contain whitespaces, linebreaks and tab jumps? Regex?

I consider myself still a newbie to regex and have the following challenge:

My users post content that contain one or more "line breaks" at the end. These "line breaks" are <p><br></p> with varying amounts of whitespace between the tags. Sometimes, more than one <br> is in each paragraph. Some examples:

<p>

<br> 
</p>
<p>
     <br> 
</p>
<p><br> <br> 
</p>
<p>
 <br> 
</p>

How can I remove these paragraphs from the end of each piece of content, while also removing the contained <br>s, spaces, line breaks, and tabs?

Upvotes: 1

Views: 157

Answers (1)

0b10011
0b10011

Reputation: 18785

<?php
$strings[] = 'foo<p>

<br> 
</p>';
$strings[] = 'foo<p>
     <br> 
</p>';
$strings[] = 'foo<p><br><br> 
</p>';
$strings[] = 'foo<p>
 <br> 
</p>';

foreach($strings as $string){
 // \s* matches any number of whitespace characters (" ", \t, \n, etc)
 // (?:...)+ matches one or more (without capturing the group)
 // $ forces match to only be made at the end of the string
 $string = preg_replace("/(?:<p>\s*(?:<br>\s*)+<\/p>\s*)+$/", "", $string);

 echo $string."\n---\n";
}

Output is:

foo
---
foo
---
foo
---
foo
---

Upvotes: 1

Related Questions