Reputation: 351
I am trying to write a preg_replace that will clean all tag properties of the allowed tags, and all tags that do not exist in the allowed list.
Basic example- this:
<p style="some styling here">Test<div class="button">Button Text</div></p>
would turn out to be:
<p>test</p>
I have this working well.. Except for img tags and a href tags. I need to not clean the properties of the img and a tags. Possibly others. I was not sure if there was a way to set two allow lists?
1) One list for what tags are allowed to stay after being cleaned
2) One list for the tags that are allowed but left alone?
3) The rest are deleted.
Here is the script I am working on:
$string = '<p style="width: 250px;">This is some text<div class="button">This is the button</div><br><img src="waves.jpg" width="150" height="200" /></p><p><b>Title</b><br>Here is some more text and <a href="#" target="_blank">this is a link</a></p>';
$output = strip_tags($string, '<p><b><br><img><a>');
$output = preg_replace("/<([a-z][a-z0-9]*)[^>]*?(\/?)>/i", '<$1$2$3$4$5>', $output);
echo $output;
This script should clean the $string to be:
<p>This is some text<br><img src="waves.jpg" width="150" height="200" /></p><p><b>Title</b><br>Here is some more text and <a href="#" target="_blank">this is a link</a></p>
Upvotes: 1
Views: 377
Reputation: 6800
This function will strip an element of disallowed sub elements, clean its "stripped" sub elements, and leave the rest (recursively).
function clean($element, $allowed, $stripped){
if(!is_array($allowed) || ! is_array($stripped)) return;
if(!$element)return;
$toDelete = array();
foreach($element->childNodes as $child){
if(!isset($child->tagName))continue;
$n = $child->tagName;
if ($n && !in_array($n, $allowed) && !in_array($n, $stripped)){
$toDelete[] = $child;
continue;
}
if($n && in_array($n, $stripped)){
$attr = array();
foreach($child->attributes as $a)
$attr[] = $a->nodeName;
foreach($attr as $a)
$child->removeAttribute($a);
}
clean($child, $allowed, $stripped);
}
foreach ($toDelete as $del)
$element->removeChild($del);
}
This is the code to clean your string:
$xhtml = '<p style="width: 250px;">This is some text<div class="button">This is the button</div><br><img src="waves.jpg" width="150" height="200" /></p><p><b>Title</b><br>Here is some more text and <a href="#" target="_blank">this is a link</a></p>';
$dom = new DOMDocument();
$dom->loadHTML($xhtml);
$body = $dom->getElementsByTagName('body')->item(0);
clean($body, array('img', 'a'), array('p', 'br', 'b'));
echo preg_replace('#^.*?<body>(.*?)</body>.*$#s', '$1', $dom->saveHTML($body));
You should check out the Documentation for PHP's DOM classes
Upvotes: 1