user2301765
user2301765

Reputation: 729

Parsing HTML and removing specific td

I have html content like the following...

<table>
  <tr>
    <td>xyx...</td>
    <td>abc....</td>
    <td><span><h3>Downloads</h3></span><br>blah blah blah...</td>
  </tr>
  <tr>
    <td><h3>Downloads</h3>again some content.</td>
    <td>dddd</td>
    <td>kkkl...</td>
  </tr>
</table>

Now am trying to delete 'td's if it has the word 'Downloads' anywhere in the content. After some research on internet I can get something executed and the code is as follows...

$res_text = 'MY HTML';

# Create a DOM parser object
$dom = new DOMDocument();

# Parse the HTML from Google.
# The @ before the method call suppresses any warnings that
# loadHTML might throw because of invalid HTML in the page.
@$dom->loadHTML($res_text);         

$selector = new DOMXPath($dom);


$results = $selector->query('//*[text()[contains(.,"Downloads")]]');

if($results->length){
    foreach($results as $res){
        $res->parentNode->removeChild($res);
    }
}

This does deletes the word 'Downloads' with its current parent node <span> or <p>, but I wanted the whole <td> should be deleted along with the content.

I tried...

$results = $selector->query('//td[text()[contains(.,"Downloads")]]');

but it's not working. Can some one tell me how can I get it?

Upvotes: 2

Views: 91

Answers (1)

AyB
AyB

Reputation: 11665

You don't need the text() in your query, it should be:

$results = $selector->query('//td[contains(.,"Downloads")]');

The whole code:

$dom = new DOMDocument();
$dom->loadHTML($res_text);
$selector = new DOMXPath($dom);
$results = $selector->query('//td[contains(.,"Downloads")]');
if($results->length){
   foreach($results as $res){
           $res->parentNode->removeChild($res);
    }
}

echo htmlentities($dom->saveHTML());

DEMO

Upvotes: 2

Related Questions