Reputation: 222821
I am using HTML purifier to remove all unnecessary/malicious html tags.
$html = 'dirty html provided by user';
$config = HTMLPurifier_Config::createDefault();
$config->set('HTML.Allowed', 'p,a[href], ... other tags);
$purifier = new HTMLPurifier($config);;
$output = $purifier->purify($html);
It works really nice, but I want to do a little bit more. I want to change all my <a href='link'>...</a>
to something else like <a href='somefunc(link)' rel="nofollow" target="_blank"> ... </a>
.
After searching for a little bit, I found the following relevant link, but the problem is that it requires patching a complex library (which is not really a good idea, also the solution is kind of complicated).
Reading through their forum post, it looks like there is solution for adding nofollow parameter is $config->set("HTML.Nofollow", true);
, but I still fail to find how can modify every link.
My current solution is to parse purified html by myself and to modify a link, but I think that there is a way to do this through HTML Purifier
.
Upvotes: 4
Views: 3147
Reputation: 185
Htmlpurifier offers an API for URL mangling.
See http://htmlpurifier.org/docs/enduser-uri-filter.html
Basically you create a filter class like
class HTMLPurifier_URIFilter_MyPostFilter extends HTMLPurifier_URIFilter
{
public $name = 'MyPostFilter';
public $post = true;
public function prepare($config) {}
public function filter(&$uri, $config, $context) {
// ... extra code here
}
}
You do your magic in the filter function. Have a look in the documentation for the semantics of the url object that gets passed.
You can then activate the filter with
$uri = $config->getDefinition('URI');
$uri->addFilter(new HTMLPurifier_URIFilter_MyPostFilter(), $config);
Upvotes: 2
Reputation: 222821
Actually I found partial solution on one of the links on the forum.
This is what I need to do:
$config->set('HTML.Nofollow', true);
$config->set('HTML.TargetBlank', true);
So the full thing looks like this:
$config = HTMLPurifier_Config::createDefault();
$config->set('HTML.Nofollow', true);
$config->set('HTML.TargetBlank', true);
$config->set('HTML.Allowed', 'a,b,strong,i,em,u');
$purifier = new HTMLPurifier($config);
Upvotes: 8
Reputation: 4854
You can use preg_replace()
. The regex would be:
/<a href='(\b(?:(?:https?|ftp):\/\/|www\.)[-a-z0-9+&@#\/%?=~_|!:,.;]*[-a-z0-9+&@#\/%=~_|])'>([a-zA-Z0-9\s._\-]*)<\/a>/
So the function would be:
$pattern = "/<a href='(\b(?:(?:https?|ftp):\/\/|www\.)[-a-z0-9+&@#\/%?=~_|!:,.;]*[-a-z0-9+&@#\/%=~_|])'>([a-zA-Z0-9\s._\-]*)<\/a>/";
$replacement = "<a href='$1' rel='nofollow' target='_blank'>$2</a>";
$html = preg_replace($pattern, $replacement, $html);
Also if you want to do something with the url, the replacement string would be:
$replacement = "<a href='".somefunction("$1")."' rel='nofollow' target='_blank'>$2</a>";
The regex explain and examples.
Edit: Adding attributes to links in HTML Purifier:
$def = $config->getHTMLDefinition(true);
$def->addAttribute('a', 'target', 'Enum#_blank,_self,_target,_top');
More about adding attributes in HTML Purifier
Upvotes: 1