Gabriel Smoljar
Gabriel Smoljar

Reputation: 1236

Remove HTML element from parsed HTML document on a condition

I've parsed a HTML document using Simple PHP HTML DOM Parser. In the parsed document there's a ul-tag with some li-tags in it. One of these li-tags contains one of those dreaded "Add This" buttons which I want to remove.

To make this worse, the list item has no class or id, and it is not always in the same position in the list. So there is no easy way (correct me if I'm wrong) to remove it with the parser.

What I want to do is to search for the string 'addthis.com' in all li-elements and remove any element that contains that string.

<ul>
    <li>Foobar</li>
    <li>addthis.com</li><!-- How do I remove this? -->
    <li>Foobar</li>
</ul>

FYI: This is purley a hobby project in my quest to learn PHP and not a case of content theft for profit.

All suggestions are welcome!

Upvotes: 3

Views: 1619

Answers (3)

Stano
Stano

Reputation: 8949

This solution uses DOMDocument class and domnode.removechild method:

$str="<ul><li>Foobar</li><li>addthis.com</li><li>Foobar</li></ul>";
$remove='addthis.com';
$doc = new DOMDocument();
$doc->loadHTML($str);
$elements = $doc->getElementsByTagName('li');
$domElemsToRemove = array();
foreach ($elements as $element) {
  $pos = strpos($element->textContent, $remove); // or similar $element->nodeValue
  if ($pos !== false) {
    $domElemsToRemove[] = $element;
  }
}
foreach( $domElemsToRemove as $domElement ){
  $domElement->parentNode->removeChild($domElement);
}
$str = $doc->saveHTML(); // <ul><li>Foobar</li><li>Foobar</li></ul>

Upvotes: 1

Adam
Adam

Reputation: 36703

Couldn't find a method to remove nodes explicitly, but can remove with setting outertext to empty.

$html = new simple_html_dom();
$html->load(file_get_contents("test.html"), false, false); // preserve formatting

foreach($html->find('ul li') as $element) {
  if (count($element->find('a.addthis_button')) > 0) {
    $element->outertext="";
  }
}

echo $html;

Upvotes: 3

Hans Wassink
Hans Wassink

Reputation: 2577

Well what you can do is use jQuery after the parsing. Something like this:

$('li').each(function(i) {
    if($(this).html() == "addthis.com"){
        $(this).remove();
    }
});

Upvotes: 1

Related Questions