Reputation: 485
I am using the below code to extract values from an HTML file. The code returns a block of text. I want to know how to improve the code and extract elements of this block of code into a clean table.
File:
<div class=class1>
<a href="">txt1</a>
<div class=lvl2>
<p>hello1</p>
</div>
<a href="">txt2</a>
<div class=lvl2>
<p>hello2</p>
</div>
</div>
Code:
$doc = new DOMDocument();
@$doc->loadHTMLFile('file.htm');
$xpath = new DOMXPath($doc);
$list = $xpath->evaluate("//div[contains(@class, 'class1')]");
foreach ($list as $element)
{
echo '<p>' . $element->nodeValue . PHP_EOL . '</p>';
}
Desired output:
txt1 | hello1
txt2 | hello2
Upvotes: 1
Views: 236
Reputation: 3580
Or, you could do it this way if you wanted to make sure you were outputting each table separately. It assumes ordering is maintained, which I don't think is always guaranteed with XML / XPath, but in practice it usually is with most implementations:
$doc = new DOMDocument();
$doc->loadHTMLFile('file.htm');
$xpath = new DOMXPath($doc);
$list = $xpath->evaluate("//div[contains(@class, 'class1')]");
foreach ($list as $element)
{
$column1 = $xpath->query("//a", $element);
$column2 = $xpath->query("//div/p", $element);
for ($i = 0; $i < $column1->length; $i++) {
echo $column1->item($i)->nodeValue . ' | ' . $column2->item($i)->nodeValue . PHP_EOL;
}
}
I've removed the @
error suppression from the loadHTMLFile
method - I don't think you want to use that because if this fails you will get errors later on anyway, and leaving it out will make the cause of your problem more explicit.
Amended: here's another way you could structure the loop if you don't want to iterate separately over both columns. It may still fail though, if the numbers of rows in each column don't match in the html:
foreach ($list as $element)
{
$column1 = $xpath->query("//a", $element);
for ($i = 0; $i < $column1->length; $i++) {
$field1 = $column1->item($i);
$field2 = $xpath->query("following-sibling::div", $field1)->item(0);
echo $field1->nodeValue . ' | ' . trim($field2->nodeValue) . PHP_EOL;
}
}
Upvotes: 1
Reputation: 3580
How about this?:
$doc = new DOMDocument();
@$doc->loadHTMLFile('file.htm');
$xpath = new DOMXPath($doc);
$list = $xpath->evaluate("//div[contains(@class, 'class1')]/a");
foreach ($list as $element)
{
$nextElement = $element->nextSibling;
while ($nextElement->nodeType != XML_ELEMENT_NODE) {
$nextElement = $nextElement->nextSibling;
}
echo $element->nodeValue . ' | ' . trim($nextElement->nodeValue) . PHP_EOL;
}
I wasn't quite sure why you wanted <p>
as well as PHP_EOL
, so I left those out, but you can put them back in where you need them.
Upvotes: 0