gom
gom

Reputation: 897

Use htmlspecialchars but allow <a href="URL">...</a>

Use htmlspecialchars but allow <a href="URL">...</a>.
Extra spaces should also be valid, like < a href =.


$pureHTML = htmlspecialchars($dirtyHTML,ENT_QUOTES);

Now maybe I need a preg_replace on $pureHTML, but what should be the preg_replace? Or do you recommend using HTML Purifier?

I want to allow anchor tag pair only for href attribute. Onclick, target attributes are not allowed.

Upvotes: 0

Views: 1271

Answers (2)

cwurtz
cwurtz

Reputation: 3257

I'm not sure that you can get this in a single replace as you would have to match and replace the "<a" and ">" surrounding the href, while leaving it intact. the closing "<a>" is an easy replace thought. There could be a way to do this all in a single preg_replace, but I'm not proficient enough in regex to do so.. anyway I would do

$pureHTML = htmlspecialchars($dirtyHTML, ENT_NOQUOTES);
preg_match_all('/(&lt;\s*a)\s*(\w+="[\w:\/@#%_\-&\.]+")\s*(&gt;)/i', $pureHTML, $matches, PREG_SET_ORDER);
foreach($matches as $match) {
    $pureHTML = str_replace($match[0], "<a " . $match[2] . ">", $pureHTML);
}
$pureHTML = preg_replace('/(&lt;\/\s*a\s*&gt;)/i', '</a>', $pureHTML);

Basically, it matches the escaped form of (<a)(href="url")(>), allowing for spaces between each part (also space between "<" and "a"). It then replaces the original match with literal <a(href="url")>

And then just does a direct string replace of the escaped form of (allowing for spaces)

Upvotes: 2

z1m.in
z1m.in

Reputation: 1661

I think you need strip-tags() function.

$pureHTML =  strip_tags($html, '<a>');

Upvotes: 1

Related Questions