kasakka
kasakka

Reputation: 838

Simple HTML Dom: How to remove elements?

I would like to use Simple HTML DOM to remove all images in an article so I can easily create a small snippet of text for a news ticker but I haven't figured out how to remove elements with it.

Basically I would do

  1. Get content as HTML string
  2. Remove all image tags from content
  3. Limit content to x words
  4. Output.

Any help?

Upvotes: 38

Views: 55244

Answers (11)

Mike
Mike

Reputation: 39

Below I remove the HEADER and all SCRIPT nodes of the incoming url by using 2 different methods of the FIND() function. Remove the 2nd parameter to return an array of all matching nodes then just loop through the nodes.

$clean_html = file_get_html($url);
 
// Find and remove 1st instance of node.   
$node = $clean_html->find('header', 0);
$node->remove();       

// Find and remove all instances of Nde.
$nodes = $clean_html->find('script');
foreach($nodes as $node) {
    $node->remove();       
}

Upvotes: 0

Jaím Trabilsi
Jaím Trabilsi

Reputation: 11

This works now:

$element->remove();

You can see the documentation for the method here.

Upvotes: 0

Skywalker
Skywalker

Reputation: 1774

Try this:

$dom = new Dom();
$dom->loadStr($text);
foreach ($dom->find('element') as $element) {
   $element->delete();
}

Upvotes: 0

Marco Sánchez
Marco Sánchez

Reputation: 50

Use outerhtml instead of outertext

<div id='your_div'>the contents of your div</div>

$your_div->outertext = '';
echo $your_div // echoes <div id='your_div'></div>

$your_div->outerhtml= '';
echo $your_div // echoes nothing

Upvotes: 0

Lucas
Lucas

Reputation: 10303

Adding new answer since removeNode is definitely a better way of removing it:

$html->removeNode('img');

This method probably was not available when accepted answer was marked. You do not need to loop the html to find each one, this will remove them.

Upvotes: 0

marcelde
marcelde

Reputation: 271

The supposed solutions are quite expensive and practically unusable in a big loop or other kind of repetition.

I prefer to use "soft deletes":

foreach($html->find('somecondition'),$item){
    if (somecheck) $item->setAttribute('softDelete', true); //<= set marker to check in further code
    $item->outertext='';


   foreach($foo as $bar){
       if(!baz->getAttribute('softDelete'){
           //do something 
        }
    }

}

Upvotes: 2

Sid
Sid

Reputation: 4512

I think you have some difficulties because you forgot to save(dump the internal DOM tree back into string).

Try this:

$html = file_get_html("http://example.com");

foreach($html ->find('img') as $item) {
    $item->outertext = '';
    }

$html->save();

echo $html;

Upvotes: 14

Dr. Reshef
Dr. Reshef

Reputation: 301

when you only delete the outer text you delete the HTML content itself, but if you perform another find on the same elements it will appear in the result. the reason is that the simple HTML DOM object still has it's internal structure of the element, only without its actual content. what you need to do in order to really delete the element is simply reload the HTML as string to the same variable. this way the object will be recreated without the deleted content, and the simple HTML DOM object will be built without it.

here is an example function:

public function removeNode($selector)
{
    foreach ($this->find($selector) as $node)
    {
        $node->outertext = '';
    }

    $this->load($this->save());        
}

put this function inside the simple_html_dom class and you're good.

Upvotes: 30

baniadams
baniadams

Reputation: 19

This is working for me:

foreach($html->find('element') as $element){
   $element = NULL;
}

Upvotes: 1

JaseC
JaseC

Reputation: 3209

I could not figure out where to put the function so I just put the following directly in my code:

$html->load($html->save());

It basically locks changes made in the for loop back into the html per above.

Upvotes: 5

Gordon
Gordon

Reputation: 316979

There is no dedicated methods for removing elements. You just find all the img elements and then do

$e->outertext = '';

Upvotes: 55

Related Questions