kylex
kylex

Reputation: 14406

Retrieve text using DomDocument, but remove inner h1 tag

I have some html where I'm attempting to retrieve the text but not with the <h1> tag content.

$html = '<div class="mytext">   
           <h1>Title of document</h1>   
           This is the text that I want, without the title.
         </div>';

$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$xp = new DOMXpath($dom);
foreach($xp->query('//div[@class="mytext"]') as $node) {
  $description = $node->nodeValue;
  echo $description; 
}

End result should be: This is the text that I want, without the title.

Currently it's: Title of document This is the text that I want, without the title

How can I just get the text without the h1 tag?

Upvotes: 0

Views: 243

Answers (1)

RainDev
RainDev

Reputation: 1128

try this:

foreach($xp->query('//div[@class="mytext"]/text()[normalize-space()]') as $node) {
   $description = $node->nodeValue;
   echo $description; 
}

Upvotes: 1

Related Questions