splash58
splash58

Reputation: 26153

The same XPaths - different results

$str = '
<body>
<table><tr><td><b class="1">1</b></td></tr></table>
<table><tr><td><b class="2">1</b></td></tr></table>
<p>some text</p>
</body>';

$dom = new DOMDocument();
$dom->loadHTML($str);
$xpath = new DOMXpath($dom);

foreach($xpath->query('//table[//b[contains(@class, "2")]]') as $i) 
   print_r($i);

echo "------------------------------------------\n";

foreach($xpath->query('//table//b[contains(@class, "2")]/ancestor::table') as $i) 
   print_r($i);

The first XPath selects both tables while the second one gets only the target (second) table. Why?

test on eval.in

Upvotes: 2

Views: 91

Answers (2)

Christian Hujer
Christian Hujer

Reputation: 17935

There's a bug in your XPath predicate [//b...]. It should be [.//b...] instead.

Explanation: [...] are predicates, they only act as filters. When you say a[b], you select all a nodes which satisfy the predicate [b]. In case a and b are elements, it would, from the current context node, select all a elements which contain a b element as a child element.

  • //b is an AbbreviatedAbsoluteLocationPath and selects all b element nodes in the entire document. Both tables are in a document with a b element that qualifies, therefore the predicate [//b] is always true for your document, no matter where you apply it.
  • .//b is an AbbreviatedRelativeLocationPath and selects all b element nodes which are descendants (children and their children, recursively). The predicate [.//b] will only be true for table elements which have a descendant element b.

Step path expressions like //b or .//b, when used as predicates like [//b] or [.//b], are true if the nodeset selected by the step path expression is not empty.

The predicate applied doesn't change anything about that, because of the //b instead of .//b: //b[contains(@class, "2")] selects all be elements in the entire document that contain "2" in their class attribute. You're basically performing a check on the document, not the tree below your desired table element, and that document check is satisfied for both table elements because both are in a document which contains a b element that has "2" in its class attribute.

Upvotes: 2

Mathias M&#252;ller
Mathias M&#252;ller

Reputation: 22617

The accepted answer corrects the mistake, but does not really explain why the original path expression went wrong.

Your first expression looks like:

//table[//b[contains(@class, "2")]]

It has two predicates, one nested inside the other:

//table[//b[contains(@class, "2")]]
           ^---------------------^       inner predicate
       ^--------------------------^      outer predicate

Think of predicates as filters that are applied to the left context of the predicate. In the extreme cases, either none or all of the intermediate result nodes are discarded by such a predicate.

Each intermediate result node is only kept if the predicate to its right evaluates to true. In the case of the inner predicate:

//b[contains(@class, "2")]

//b yields a set of intermediate b element nodes (all b element nodes in the entire document) that are then filtered by the predicate [contains(@class, "2")]. Given your input XML document, the expression inside the predicate only returns true for one of the b elements.

But //b[contains(@class, "2")] in turn serves as the content of the outer predicate:

//table[outer predicate]

Now //table selects as an intermediate result all table element nodes in the entire document, and for each of them, the expression inside the predicate is checked.

Importantly, the outer predicate //b[contains(@class, "2")] will return true for both table elements. This is because for both of them it is true that somewhere in the entire document, there is a b element whose class attribute contains 2.

What you actually wanted to do is: evaluate the outer predicate expression from the perspective of each table element - and the accepted answer shows how to do that. Namely, using .// instead of // in the predicate.

Upvotes: 4

Related Questions