PHP Regex to remove HTML-Tags inside
code blocks

Question

I have a tricky string of HTML code that includes several pre tags that inside them include code (say, python), and that are also decorated by HTML tags that should be removed.

For example:

Some text.
a = 5 

b = 3

More text
a2 = "text"
b = 3

final text

I would like to clean out all the HTML tags (these are likely to be basic tags, br, em, div, a, etc.). I do not need to parse the HTML, I know that regex cannot parse html.

Some text.
a = 5
b = 3

More text
a2 = "text"
b = 3

final text

I'd like to do this using PHP (with something like preg_replace). For example:

$html = "         
         some_code = 1
         
         "; // Compacting things here, for brevity

$newHTML = preg_replace("/(.*?)]*>(.*?)<\/pre>(.*)/Us", "$1".strip_tags("$2", '')."$3", $html);
echo $newHTML;

This example code obviously doesn't since: (1) it would work for only one pre tag, and (2) the code strip_tags("$2", '
') would obviously not work, since it doesn't do the processing of the string in the right location (it would just return "$2" instead of getting the text and manipulating it properly).

Any suggestions on how this could be done in PHP? Thanks.

anubhava · Accepted Answer

You will need to use preg_replace_callback and call strip_tags in callback body:

preg_replace_callback('~(]*>)([\s\S]*?)(

)~',
function ($m) { return $m[1] . strip_tags($m[2], ['p', 'b', 'strong']) . $m[3]; },
$s);

Some text.
a = 5
b = 3

More text
a2 = "text"
b = 3

final text

Note that above strip_tags strips all tags except p, b and strong.

RegEx Details:

(]*>): Match and capture in group #1
([\s\S]*?): Match 0 or or more of any character including newline (lazy), capture this in group $2. [\s\S] matches any character including newline.
(): Match and capture in group #3

PHP Regex to remove HTML-Tags inside <pre></pre> code blocks

Answers (1)

Related Questions

PHP Regex to remove HTML-Tags inside &lt;pre&gt;&lt;/pre&gt; code blocks

Answers (1)

Related Questions

PHP Regex to remove HTML-Tags inside <pre></pre> code blocks