user1008735
user1008735

Reputation: 31

php Xpath getting innerHTML with innerHTML tags

I have a HTML file formatted like this:

<p class="p1">subject</p>
<p class="p2">detail <span>important</span></p>

<p class="p1">subject</p>
<p class="p2">detail<span>important</span></p>

I wrote a PHP code to automatically get each p1 and it's detail to insert them into my mysql table.

this is my code:

$doc = new DOMDocument();

$doc->loadHTMLFile("file.html");

$xpath = new DomXpath($doc);

$subject = $xpath->query('//p');


for ($i = 0 ; $i < $subject->length-1 ; $i ++) {

if ($subject->item($i)->getAttribute("class") == "p1")
    echo $subject->item($i)->nodeValue;
}
...

This is not my full code, but the problem is:

echo $subject->item($i)->nodeValue;

Which gives me <p>detail important</p>, without the <span></span> tag.

It is so important to have the span tags around the "important" part of the detail. is there any function which can do that without getting headache?

Thanks in advance

Upvotes: 2

Views: 2278

Answers (3)

Marco Marsala
Marco Marsala

Reputation: 2462

Old query, but there is an one-liner. The OP should use:

$subject = $xpath->query('//p/*');

and then:

echo $doc->saveHtml($subject->item($i));

With the * you'll get the inner html (without the wrapping paragraph tag); without * you'll get the html with the wrapping paragraph;

Full example:

$html = '<div><p>ciao questa è una <b>prova</b>.</p></div>';
$dom = new DomDocument($html);
@$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$node = $xpath->query('.//div/*'); // with * you get inner html without surrounding div tag; without * you get inner html with surrounding div tag
$innerHtml = $dom->saveHtml($node);
var_dump($innerHtml);

Output: <p>ciao questa è una <b>prova</b>.</p>

Upvotes: 0

user1008735
user1008735

Reputation: 31

I found the answer to my question :) Thanks to SimpleHTMLDOM

foreach($html->find('p') as $element) {

 switch ($element->class) {
      case 'p1':
                     $subject = $element;
                     break;
      case 'p2': $detail .= html_entity_decode($element);

 }

}

the trick is in:

html_entity_decode($element);

Upvotes: 1

Supreme Pizza
Supreme Pizza

Reputation: 9

Whenever I need to parse HTML, I run it through SimpleHTMLDOM:

http://simplehtmldom.sourceforge.net/

I recommend using version 1.11. For various reasons, 1.5 is rather broken.

Upvotes: 0

Related Questions