domizai
domizai

Reputation: 353

Add attribute to HTML tag depending on inner pattern using PHP

I actually already found a workable solution and its called regex. Yes i know, it has been said zillions of times not to use regex for HTML parsing. But here is the thing, as the title says, it depends on the inner HTML text, which need to follow a certain pattern. So i need to use regex anyway! I tried using the DOM library first but i failed.

So my actual question is if there is a best practice for this issue? Anyway, here is what I've got:

HTML before:

<section> 
    {foo:bar}
</section>

PHP:

// I'm not a regex ninja, but this seems to do the job

$regexTag = "/<(?!body|head|html|link|script|\!|\/)(\w*)[^>]*>[^{]*{\s*[^>]*:\s*[^>]*\s*[^}]}/";
// $match[0] "<section> {foo:bar}"
// $match[1] "section"


preg_match_all($regexTag,$html, $match); 


for ($i=0; $i < sizeof($match[0]); $i++) { 
    $pos = (strlen($match[1][$i])+1);
    $str = substr_replace($match[0][$i], " class='foo'", $pos, 0);
    $html = str_replace($match[0][$i], $str, $html);
}

HTML after:

<section class='foo'> 
    {foo:bar}
</section>

Upvotes: 1

Views: 725

Answers (2)

domizai
domizai

Reputation: 353

So this works

$elements = $dom->getElementsByTagName('body')->item(0)->childNodes;

for ($i = $elements->length-1; $i >= 0; $i--) { 
   $element = $elements->item($i); 
   $tag =  $element->nodeName;

   foreach ($dom->getElementsByTagName($tag) as $tag) {
       ...

I dunno though, i still feel more comfortable with regex, haha. But i guess this is the way to go.

Upvotes: 0

Amal Murali
Amal Murali

Reputation: 76646

A regex is not the correct tool for this job. Stick with the DOM parser approach. Here's a quick solution using DOMDocument class.

Use getElementsByTagName('*') to get all the tags, and then use in_array() to check if the tag name is in the list of disallowed tags.

Then use a regex with preg_match() to check if the text content follows the {foo:bar} pattern. If it does, add the new attributes one by one, setAttribute() method:

// An array containing all attributes
$attrs = [
    'class' => 'foo'
    /* more attributes & values */
];

$ignored_tags = ['body', 'head', 'html', 'link', 'script'];

$dom = new DOMDocument;
$dom->loadXML($html);

foreach ($dom->getElementsByTagName('*') as $tag) 
{
    // If not a disallowed tag
    if (!in_array($tag->tagName, $ignored_tags)) 
    {
        $textContent = trim($tag->textContent);

        // If $textContent matches the format '{foo:bar}'
        if (preg_match('#{\s*[^>]*:\s*[^>]*\s*[^}]}#', $textContent)) 
        {
            foreach ($attrs as $attr => $val)
                $tag->setAttribute($attr, $val);
        }
    }
}

echo $dom->saveHTML();

Output:

<section class="foo"> 
    {foo:bar}
</section>

Upvotes: 1

Related Questions