beamkiller
beamkiller

Reputation: 188

Issue with XPath PHP parsing - empty

So I have tried every method but without luck.

If I call XPATH like /html/head/title or with class name I got a result. The issue that my HTML doesn't contain any special class or id which I can use for the data I need from the HTML file.

My HTML file: https://slv.tipp.sk/wp-content/uploads/strazcalv/7259/7259_original.html

I want to get with XPATH the following things from HTML file:

//Parse the HTML DOM element to save additional data as taxonomy
    $downloaded_html = new DOMDocument();

    $downloaded_html->loadHTMLFile($filename);

    /* error_log("HTML DOM ELEMENT");
    error_log(print_r($downloaded_html,true)); */

    $xpath = new DOMXPath($downloaded_html);


    /* error_log("XPATH ELEMENT");
    error_log(print_r($xpath,true)); */

    $okres = $xpath->query("//table[1]//tbody[1]//tr[1]//td[4]");
    $kat_uzemie = $xpath->query("/html/body/div[1]/table[1]/tbody/tr[3]/td[4]")->item(0)->textContent;
    $kodku = $xpath->query("/html/body/div[1]/table[1]/tbody/tr[3]/td[3]")->item(0)->textContent;

//Desired $okres value is Komárno
    error_log("OKRES OBJECT:");
        error_log(print_r($okres,true));
        error_log(var_dump($okres,true));
        error_log("OKRES STRING:");
        error_log($okres->item(0)->textContent);

But all the values are empty, I have tried rel XPath and abs XPath too without luck.

This query works correctly:

$okres = $xpath->query("//p[@class='black20Bold']");

and the result is: VÝPIS Z LISTU VLASTNÍCTVA č. 7259

Can someone point me in the right direction what can be the problem? Thanks.

Upvotes: 0

Views: 285

Answers (1)

Sean Bright
Sean Bright

Reputation: 120654

There are a few problems with your code, but the main issue is that you are referencing a tbody which doesn't exist in the HTML file. The browser will automatically insert a tbody into the DOM when it is missing, but PHP's DOMDocument does not do that. Secondly, DOMXPath::query() will always return a node list, whereas you seem to want the text content, so you can use DOMXPath::evaluate() instead:

$okres = $x->evaluate('string(//table[1]/tr[1]/td[4]/text())');

As an aside, I needed to remind myself that offsets/indices in XPath are 1-based and not 0-based. So in the expression above we are looking for the first table, not the second.

Upvotes: 2

Related Questions