Reputation: 31

php Xpath getting innerHTML with innerHTML tags

I have a HTML file formatted like this:

<p class="p1">subject</p>
<p class="p2">detail <span>important</span></p>

<p class="p1">subject</p>
<p class="p2">detail<span>important</span></p>

I wrote a PHP code to automatically get each p1 and it's detail to insert them into my mysql table.

this is my code:

$doc = new DOMDocument();

$doc->loadHTMLFile("file.html");

$xpath = new DomXpath($doc);

$subject = $xpath->query('//p');


for ($i = 0 ; $i < $subject->length-1 ; $i ++) {

if ($subject->item($i)->getAttribute("class") == "p1")
    echo $subject->item($i)->nodeValue;
}
...

This is not my full code, but the problem is:

echo $subject->item($i)->nodeValue;

Which gives me detail important, without the  tag.

It is so important to have the span tags around the "important" part of the detail. is there any function which can do that without getting headache?

Thanks in advance

Upvotes: 2

Answers (3)

Marco Marsala

Reputation: 2462

Old query, but there is an one-liner. The OP should use:

$subject = $xpath->query('//p/*');

and then:

echo $doc->saveHtml($subject->item($i));

With the * you'll get the inner html (without the wrapping paragraph tag); without * you'll get the html with the wrapping paragraph;

Full example:

$html = '<div><p>ciao questa è una <b>prova</b>.</p></div>';
$dom = new DomDocument($html);
@$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$node = $xpath->query('.//div/*'); // with * you get inner html without surrounding div tag; without * you get inner html with surrounding div tag
$innerHtml = $dom->saveHtml($node);
var_dump($innerHtml);

Output: ciao questa è una prova.

Upvotes: 0

user1008735

Reputation: 31

I found the answer to my question :) Thanks to SimpleHTMLDOM

foreach($html->find('p') as $element) {

 switch ($element->class) {
      case 'p1':
                     $subject = $element;
                     break;
      case 'p2': $detail .= html_entity_decode($element);

 }

}

the trick is in:

html_entity_decode($element);

Upvotes: 1

Supreme Pizza

Reputation: 9

Whenever I need to parse HTML, I run it through SimpleHTMLDOM:

http://simplehtmldom.sourceforge.net/

I recommend using version 1.11. For various reasons, 1.5 is rather broken.

Upvotes: 0

php Xpath getting innerHTML with innerHTML tags

Answers (3)

Related Questions