Reputation: 31
I have a HTML file formatted like this:
<p class="p1">subject</p>
<p class="p2">detail <span>important</span></p>
<p class="p1">subject</p>
<p class="p2">detail<span>important</span></p>
I wrote a PHP code to automatically get each p1 and it's detail to insert them into my mysql table.
this is my code:
$doc = new DOMDocument();
$doc->loadHTMLFile("file.html");
$xpath = new DomXpath($doc);
$subject = $xpath->query('//p');
for ($i = 0 ; $i < $subject->length-1 ; $i ++) {
if ($subject->item($i)->getAttribute("class") == "p1")
echo $subject->item($i)->nodeValue;
}
...
This is not my full code, but the problem is:
echo $subject->item($i)->nodeValue;
Which gives me <p>detail important</p>
, without the <span></span>
tag.
It is so important to have the span tags around the "important" part of the detail. is there any function which can do that without getting headache?
Thanks in advance
Upvotes: 2
Views: 2278
Reputation: 2462
Old query, but there is an one-liner. The OP should use:
$subject = $xpath->query('//p/*');
and then:
echo $doc->saveHtml($subject->item($i));
With the *
you'll get the inner html (without the wrapping paragraph tag); without * you'll get the html with the wrapping paragraph;
Full example:
$html = '<div><p>ciao questa è una <b>prova</b>.</p></div>';
$dom = new DomDocument($html);
@$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$node = $xpath->query('.//div/*'); // with * you get inner html without surrounding div tag; without * you get inner html with surrounding div tag
$innerHtml = $dom->saveHtml($node);
var_dump($innerHtml);
Output: <p>ciao questa è una <b>prova</b>.</p>
Upvotes: 0
Reputation: 31
I found the answer to my question :) Thanks to SimpleHTMLDOM
foreach($html->find('p') as $element) {
switch ($element->class) {
case 'p1':
$subject = $element;
break;
case 'p2': $detail .= html_entity_decode($element);
}
}
the trick is in:
html_entity_decode($element);
Upvotes: 1
Reputation: 9
Whenever I need to parse HTML, I run it through SimpleHTMLDOM:
http://simplehtmldom.sourceforge.net/
I recommend using version 1.11. For various reasons, 1.5 is rather broken.
Upvotes: 0