Parsing HTML and removing specific td

Question

I have html content like the following...


  
    xyx...
    abc....
    Downloads
blah blah blah...
  
  
    Downloads
again some content.
    dddd
    kkkl...

Now am trying to delete 'td's if it has the word 'Downloads' anywhere in the content. After some research on internet I can get something executed and the code is as follows...

$res_text = 'MY HTML';

# Create a DOM parser object
$dom = new DOMDocument();

# Parse the HTML from Google.
# The @ before the method call suppresses any warnings that
# loadHTML might throw because of invalid HTML in the page.
@$dom->loadHTML($res_text);         

$selector = new DOMXPath($dom);


$results = $selector->query('//*[text()[contains(.,"Downloads")]]');

if($results->length){
    foreach($results as $res){
        $res->parentNode->removeChild($res);
    }
}

This does deletes the word 'Downloads' with its current parent node or

, but I wanted the whole should be deleted along with the content.

I tried...

$results = $selector->query('//td[text()[contains(.,"Downloads")]]');

but it's not working. Can some one tell me how can I get it?

AyB · Accepted Answer

You don't need the text() in your query, it should be:

$results = $selector->query('//td[contains(.,"Downloads")]');

The whole code:

$dom = new DOMDocument();
$dom->loadHTML($res_text);
$selector = new DOMXPath($dom);
$results = $selector->query('//td[contains(.,"Downloads")]');
if($results->length){
   foreach($results as $res){
           $res->parentNode->removeChild($res);
    }
}

echo htmlentities($dom->saveHTML());

DEMO

Parsing HTML and removing specific td

Answers (1)

Related Questions