Using PHP and xPath to extract clean table of text

Question

I am using the below code to extract values from an HTML file. The code returns a block of text. I want to know how to improve the code and extract elements of this block of code into a clean table.

File:

Code:

$doc = new DOMDocument();
@$doc->loadHTMLFile('file.htm');

$xpath = new DOMXPath($doc);

$list = $xpath->evaluate("//div[contains(@class, 'class1')]");

foreach ($list as $element)
    {
      echo '' . $element->nodeValue . PHP_EOL . '';
    }

Desired output:

 txt1 | hello1
 txt2 | hello2

Dan King · Accepted Answer

Or, you could do it this way if you wanted to make sure you were outputting each table separately. It assumes ordering is maintained, which I don't think is always guaranteed with XML / XPath, but in practice it usually is with most implementations:

$doc = new DOMDocument();
$doc->loadHTMLFile('file.htm');

$xpath = new DOMXPath($doc);

$list = $xpath->evaluate("//div[contains(@class, 'class1')]");

foreach ($list as $element)
{
    $column1 = $xpath->query("//a", $element);
    $column2 = $xpath->query("//div/p", $element);

    for ($i = 0; $i < $column1->length; $i++) {
        echo $column1->item($i)->nodeValue . ' | ' . $column2->item($i)->nodeValue .  PHP_EOL;
    }
}

I've removed the @ error suppression from the loadHTMLFile method - I don't think you want to use that because if this fails you will get errors later on anyway, and leaving it out will make the cause of your problem more explicit.

Amended: here's another way you could structure the loop if you don't want to iterate separately over both columns. It may still fail though, if the numbers of rows in each column don't match in the html:

foreach ($list as $element)
{
    $column1 = $xpath->query("//a", $element);

    for ($i = 0; $i < $column1->length; $i++) {
        $field1 = $column1->item($i);
        $field2 = $xpath->query("following-sibling::div", $field1)->item(0);

        echo $field1->nodeValue . ' | ' . trim($field2->nodeValue) .  PHP_EOL;
    }
}

Using PHP and xPath to extract clean table of text

Answers (2)

Related Questions