Reputation: 26153
$str = '
<body>
<table><tr><td><b class="1">1</b></td></tr></table>
<table><tr><td><b class="2">1</b></td></tr></table>
<p>some text</p>
</body>';
$dom = new DOMDocument();
$dom->loadHTML($str);
$xpath = new DOMXpath($dom);
foreach($xpath->query('//table[//b[contains(@class, "2")]]') as $i)
print_r($i);
echo "------------------------------------------\n";
foreach($xpath->query('//table//b[contains(@class, "2")]/ancestor::table') as $i)
print_r($i);
The first XPath selects both tables while the second one gets only the target (second) table. Why?
Upvotes: 2
Views: 91
Reputation: 17935
There's a bug in your XPath predicate [//b...]
. It should be [.//b...]
instead.
Explanation:
[...]
are predicates, they only act as filters. When you say a[b]
, you select all a
nodes which satisfy the predicate [b]
. In case a
and b
are elements, it would, from the current context node, select all a
elements which contain a b
element as a child element.
//b
is an AbbreviatedAbsoluteLocationPath
and selects all b
element nodes in the entire document. Both tables are in a document with a b
element that qualifies, therefore the predicate [//b]
is always true for your document, no matter where you apply it..//b
is an AbbreviatedRelativeLocationPath
and selects all b
element nodes which are descendants (children and their children, recursively). The predicate [.//b]
will only be true for table
elements which have a descendant element b
.Step path expressions like //b
or .//b
, when used as predicates like [//b]
or [.//b]
, are true if the nodeset selected by the step path expression is not empty.
The predicate applied doesn't change anything about that, because of the //b
instead of .//b
:
//b[contains(@class, "2")]
selects all be elements in the entire document that contain "2" in their class
attribute. You're basically performing a check on the document, not the tree below your desired table
element, and that document check is satisfied for both table
elements because both are in a document which contains a b
element that has "2" in its class
attribute.
Upvotes: 2
Reputation: 22617
The accepted answer corrects the mistake, but does not really explain why the original path expression went wrong.
Your first expression looks like:
//table[//b[contains(@class, "2")]]
It has two predicates, one nested inside the other:
//table[//b[contains(@class, "2")]]
^---------------------^ inner predicate
^--------------------------^ outer predicate
Think of predicates as filters that are applied to the left context of the predicate. In the extreme cases, either none or all of the intermediate result nodes are discarded by such a predicate.
Each intermediate result node is only kept if the predicate to its right evaluates to true
. In the case of the inner predicate:
//b[contains(@class, "2")]
//b
yields a set of intermediate b
element nodes (all b
element nodes in the entire document) that are then filtered by the predicate [contains(@class, "2")]
. Given your input XML document, the expression inside the predicate only returns true
for one of the b
elements.
But //b[contains(@class, "2")]
in turn serves as the content of the outer predicate:
//table[outer predicate]
Now //table
selects as an intermediate result all table
element nodes in the entire document, and for each of them, the expression inside the predicate is checked.
Importantly, the outer predicate //b[contains(@class, "2")]
will return true
for both table
elements. This is because for both of them it is true that somewhere in the entire document, there is a b
element whose class
attribute contains 2
.
What you actually wanted to do is: evaluate the outer predicate expression from the perspective of each table
element - and the accepted answer shows how to do that. Namely, using .//
instead of //
in the predicate.
Upvotes: 4