Reputation: 188
So I have tried every method but without luck.
If I call XPATH like /html/head/title or with class name I got a result. The issue that my HTML doesn't contain any special class or id which I can use for the data I need from the HTML file.
My HTML file: https://slv.tipp.sk/wp-content/uploads/strazcalv/7259/7259_original.html
I want to get with XPATH the following things from HTML file:
//Parse the HTML DOM element to save additional data as taxonomy
$downloaded_html = new DOMDocument();
$downloaded_html->loadHTMLFile($filename);
/* error_log("HTML DOM ELEMENT");
error_log(print_r($downloaded_html,true)); */
$xpath = new DOMXPath($downloaded_html);
/* error_log("XPATH ELEMENT");
error_log(print_r($xpath,true)); */
$okres = $xpath->query("//table[1]//tbody[1]//tr[1]//td[4]");
$kat_uzemie = $xpath->query("/html/body/div[1]/table[1]/tbody/tr[3]/td[4]")->item(0)->textContent;
$kodku = $xpath->query("/html/body/div[1]/table[1]/tbody/tr[3]/td[3]")->item(0)->textContent;
//Desired $okres value is Komárno
error_log("OKRES OBJECT:");
error_log(print_r($okres,true));
error_log(var_dump($okres,true));
error_log("OKRES STRING:");
error_log($okres->item(0)->textContent);
But all the values are empty, I have tried rel XPath and abs XPath too without luck.
This query works correctly:
$okres = $xpath->query("//p[@class='black20Bold']");
and the result is: VÝPIS Z LISTU VLASTNÍCTVA č. 7259
Can someone point me in the right direction what can be the problem? Thanks.
Upvotes: 0
Views: 285
Reputation: 120654
There are a few problems with your code, but the main issue is that you are referencing a tbody
which doesn't exist in the HTML file. The browser will automatically insert a tbody
into the DOM when it is missing, but PHP's DOMDocument
does not do that. Secondly, DOMXPath::query()
will always return a node list, whereas you seem to want the text content, so you can use DOMXPath::evaluate()
instead:
$okres = $x->evaluate('string(//table[1]/tr[1]/td[4]/text())');
As an aside, I needed to remind myself that offsets/indices in XPath are 1-based and not 0-based. So in the expression above we are looking for the first table
, not the second.
Upvotes: 2