Reputation: 13920
Trying to simple xpath that was running, now show only empty nodes.
Source: any XML file. Suppose
<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml" lang="pt-br" xml:lang="pt-br">
<head> <meta charset="utf-8"/><title>test</title> </head>
<body>
<article id="etc"><p>Hello</p><p>Bye</p></article>
</body>
</html>
I redo all, and here include a complete test:
$dom2 = new DOMDocument;
$dom2->Load($pathFile);
$xpath2 = new DOMXPath($dom2);
$entries = $xpath->query('//p');
// nothing here, all empty:
var_dump($entries); // zero!
foreach ($entries as $entry) {
echo "Found {$entry->nodeValue},";
}
// by all here!
foreach($dom2->getElementsByTagName('*') as $e )
print "\n name={$e->nodeName}"; // all tags!
What is worng, why xpath is not running?
Upvotes: 1
Views: 52
Reputation: 13920
It is an old problem with the W3C's DomDocument v1.0 standards. As an old site commented about the XPath-beginners surprise,
One of the commonly asked questions about (...) is:
"Why nothing matched for my XPath expression which seems right to me?"
Common cause of these problems is not properly defining a namespace for XPath.
But beginners are right, is an ugly behaviour for a "default thing"... So let's preserve the beginners good intuition about what is simple and good.
Is horrible to see a XPath that not seems what you need (what XML seems when its tags have no prefix). The tags are simple tags, need simple XPath.
Fixing the ugly XPath-query's behaviour with the best solution. It is not trivial because root's xmlns
attribute is read-only, so we need re-do DOM object by a new string XML:
$expTag = 'html'; // config expected tag-root
$expNs = 'http://www.w3.org/1999/xhtml'; // config
// ...
$e = $dom->documentElement; // root node
// Validate input (as expecteds configs) and change tag root:
if ($e->nodeName==$expTag && $e->hasAttribute('xmlns')
&& $e->getAttribute('xmlns')==$expNs) {
// can't do $e->removeAttribute('xmlns') because is read-only!
$xml = $dom->C14N(); // normalize quotes and remove repeateds
$xml = preg_replace("#^<$expTag (.*?)xmlns=\"[^\"]+\"#", "<$expTag\$1", $xml);
$dom = DOMDocument::LoadXML($xml);
} else
die("\n ERROR: something not expected.\n");
//...
$xpath = new DOMXPath($dom);
$entries = $xpath->query('//p'); // perfect, now back simple to express XPath!
This solution must be used only when you have no limitations, as in digital preservation contexts.
The problem in other practical contexts is the high cost (CPU) of save/reload the full XML as string, and to be safe, yet more expensive C14N method, that prepares safe XML to the regular expression.
The use of C14N (good also for other things in a digital preservation context) is necessary to ensure the correct behaviour of the regular expression — strictly the getAttribute()
method may be affected by an attribute duplication, but we can neglect this "second order" effect, or transfer the checking to the regular expression.
Upvotes: 0
Reputation: 254876
That's because the your xml has a default namespace defined:
xmlns="http://www.w3.org/1999/xhtml"
So you need to register a namespace then search using namespaced tag names:
$xpath->registerNamespace('x', 'http://www.w3.org/1999/xhtml');
$entries = $xpath->query('//x:p');
Upvotes: 1