Brandon Buster
Brandon Buster

Reputation: 1275

Why is this Basic XPath Selector Not Working

Here's my basic structure:

<div id="PrimaryContentBlock">
    <form>
         ......

I'm trying to select elements from within the form, but XPath isn't finding anything past the primarycontentblock div.

The first query finds the parent node, but the second query finds nothing.

$dom->query('//*[@id="PrimaryContentBlock"]');
$dom->query('//*[@id="PrimaryContentBlock"]/form');

Any idea why XPath would be acting so strange? I've been seeing a lot of inconsistent behavior when working with DOMXPath queries.

Upvotes: 0

Views: 1293

Answers (2)

matt
matt

Reputation: 79813

One way this could happen is if you have an XHTML document (with an xmlns decalaration on the root html element) and you are parsing it as XML. In such a document all the elements are part of the http://www.w3.org/1999/xhtml namespace, and you need to specify this when querying.

Your first query, //*[@id="PrimaryContentBlock"], will find any element with a matching id attribute, including those in the XHTML namespace (that’s what the * means). The second query, //*[@id="PrimaryContentBlock"]/form is looking for form elements that are not in any namespace. This fails to match the document since all form elements are in the default XHTML namespace.

The simplest way to fix this, if this is an XHTML document, is to parse it as HTML. If you currently are doing something like:

$domdocument->loadXML(...);

change it to use loadHTML:

$domdocument->loadHTML(...);

If you want to parse the document as XML, then you need to specify the namespace in your query. First you need to register the namespace uri and prefix you are going to use with the DOMXPath instance, then change your query to include the new prefix:

$xpath = new DOMXPath($doc);
$xpath->registerNamespace('xhtml', "http://www.w3.org/1999/xhtml");

$result = $xpath->query('//*[@id="PrimaryContentBlock"]/xhtml:form')

Upvotes: 1

hek2mgl
hek2mgl

Reputation: 158280

Given you have the above structure, and you are sure that the document is well-formed both of your queries WILL work:

$xml = <<<EOF
<div id="PrimaryContentBlock">
    <form></form>
</div>
EOF;

$doc = new DOMDocument();
$doc->loadHTML($xml);
$selector = new DOMXPath($doc);

foreach($selector->query('//*[@id="PrimaryContentBlock"]/form') as $element) {
    echo $element->nodeName;
}

Output:

form

If the following sentence is true for you:

I've been seeing a lot of inconsistent behavior when working with DOMXPath queries.

... then you have either not enough expertise with XPath, or your input data isn't well formed. At least one those both reasons apply to me when I have problems with a certain query.

Upvotes: 0

Related Questions