Reputation: 499

Regexp, that deletes all html tag's atrributes, but <a>

$body = preg_replace("/<([a-z][a-z0-9]*)[^>]*?(\/?)>/i",'<$1$2>', $body);

Hello there. I fund preg_replace, that finds all html tags, and removes their attributes. I need to exclude <a> tag from that regexp, so f.e.:

<sth a="awdawd"/><a href="http://awdwsrrdg.com"/>

should be changed to:

<sth/><a href="http://awdwsrrdg.com />

Any help would be appreciated.

Upvotes: 0

Answers (3)

pes502

Reputation: 1597

Try this regexp:

/<([b-z][a-z0-9]*)[^>]*?(\/?)>/i

Edit the first group rule [a-z] to [b-z]. Now every tag, which is start <a will be ignored.

$body = preg_replace("/<([b-z][a-z0-9]*)[^>]*?(\/?)>/i",'<$1$2>', $body);

WORKING DEMO

$pattern = /<([b-z][a-z0-9]*)[^>]*?(\/?)>/i

$replacement = <$1$2>

$text = <sth a="awdawd"/><a href="http://awdwsrrdg.com"/>

OUTPUT: <sth /><a href="http://awdwsrrdg.com"/>

Upvotes: 0

hek2mgl

Reputation: 158080

Don't use regular expressions to parse or modify HTML/XML. This will just work for a few edge cases but not in a real world application

Use a DOM Parser instead:

$html = '<sth a="awdawd"/><a href="http://awdwsrrdg.com"/>';

$doc = new DOMDocument();
$doc->loadHTML($html);

$selector = new DOMXPath($doc);

foreach($selector->query('//@*[not(parent::a)]') as $attr) {
    $attr->parentNode->removeAttribute($attr->nodeName);
}

echo $doc->saveHTML();

Upvotes: 4

Federico Piazza

Reputation: 31035

It is well known that you should not use regex to parse xhtml (use an html parser instead) since engine can mess up things when parsing strange characters, unless you really know what characters set you'll face.

On the other hand, if you want to use regex, you can leverage the discard technique with this regex:

<a\b.*?\/>(*SKIP)(*FAIL)|<(\w+).*?>

Working demo

Php code

$re = '/<a\b.*?\/>(*SKIP)(*FAIL)|<(\w+).*?>/'; 
$str = "<sth a=\"awdawd\"/><a href=\"http://awdwsrrdg.com\"/>"; 
$subst = "<$1 />"; 

$result = preg_replace($re, $subst, $str);

If you want to use your regex, you can add the discard pattern at the beginning, like this:

<a\b.*?\/>(*SKIP)(*FAIL)|<([a-z][a-z0-9]*)[^>]*?(\/?)>
           ^------^-----Discard pattern flags

Upvotes: 2

Regexp, that deletes all html tag&#39;s atrributes, but &lt;a&gt;

Answers (3)

Related Questions

Regexp, that deletes all html tag's atrributes, but <a>