Reputation: 499
$body = preg_replace("/<([a-z][a-z0-9]*)[^>]*?(\/?)>/i",'<$1$2>', $body);
Hello there. I fund preg_replace, that finds all html tags, and removes their attributes. I need to exclude <a>
tag from that regexp, so f.e.:
<sth a="awdawd"/><a href="http://awdwsrrdg.com"/>
should be changed to:
<sth/><a href="http://awdwsrrdg.com />
Any help would be appreciated.
Upvotes: 0
Views: 54
Reputation: 1597
Try this regexp:
/<([b-z][a-z0-9]*)[^>]*?(\/?)>/i
Edit the first group rule [a-z]
to [b-z]
. Now every tag, which is start <a
will be ignored.
$body = preg_replace("/<([b-z][a-z0-9]*)[^>]*?(\/?)>/i",'<$1$2>', $body);
$pattern =
/<([b-z][a-z0-9]*)[^>]*?(\/?)>/i
$replacement =
<$1$2>
$text =
<sth a="awdawd"/><a href="http://awdwsrrdg.com"/>
OUTPUT:
<sth /><a href="http://awdwsrrdg.com"/>
Upvotes: 0
Reputation: 158080
Don't use regular expressions to parse or modify HTML/XML. This will just work for a few edge cases but not in a real world application
Use a DOM Parser instead:
$html = '<sth a="awdawd"/><a href="http://awdwsrrdg.com"/>';
$doc = new DOMDocument();
$doc->loadHTML($html);
$selector = new DOMXPath($doc);
foreach($selector->query('//@*[not(parent::a)]') as $attr) {
$attr->parentNode->removeAttribute($attr->nodeName);
}
echo $doc->saveHTML();
Upvotes: 4
Reputation: 31035
It is well known that you should not use regex to parse xhtml (use an html parser instead) since engine can mess up things when parsing strange characters, unless you really know what characters set you'll face.
On the other hand, if you want to use regex, you can leverage the discard technique with this regex:
<a\b.*?\/>(*SKIP)(*FAIL)|<(\w+).*?>
Php code
$re = '/<a\b.*?\/>(*SKIP)(*FAIL)|<(\w+).*?>/';
$str = "<sth a=\"awdawd\"/><a href=\"http://awdwsrrdg.com\"/>";
$subst = "<$1 />";
$result = preg_replace($re, $subst, $str);
If you want to use your regex, you can add the discard pattern at the beginning, like this:
<a\b.*?\/>(*SKIP)(*FAIL)|<([a-z][a-z0-9]*)[^>]*?(\/?)>
^------^-----Discard pattern flags
Upvotes: 2