Reputation: 167
I am trying to get the table rows from a XHTML using XPath / XSLT. My sample xhtml looks like this :
<body>
<....>
<table>
<tbody>
<tr>
<td/>
<td/>
<td>
<table>
<tr>
<....>
</tr>
</table>
</td>
</tr>
</tbody>
</table>
</body>
In the above structure, <tbody>
may or may not be there. Tables could be nested to any level. Now I want get all the rows for a given table. So when I am processing the outer table, I want to get only the outer row (one that contains 3 tds) but not the inner tr (inside the nested table). How can I do this using XSLT or XPath?
Edit : What I am essentially looking for is a way of getting all descendant::y for a node x, but y should not be a descendant of another x. The path from x->y should not contain another x. I may not have anything that distinguishes the outer x from the inner x.
Note : I am trying to do this with many HTMLs which all have different structures, and I cannot change the structure of any HTML file - it is given to me. The only thing is that they are all well formed XHTML.
Thanks for your help.
Upvotes: 4
Views: 782
Reputation: 243449
What I am essentially looking for is a way of getting all
descendant::y
for a nodex
, buty
should not be a descendant of anotherx
.
Suppose $n is the element named x
. You want:
$n//y[count(ancestor::x) = count($n/ancestor-or-self::x)]
This selects all y
that are descendents of $n and that have such a number of x
ancestors that is exactly one greater than the number of ancestor::x of $n.
Because $n
contains an x
element, this means that for all selected y
the x
contained in $n
is their first ancestor::x
.
For your practical purposes, you only have to substitute $n
above with the exact XPath expression that selects the x
element it contains.
Upvotes: 0
Reputation: 60414
The following expression selects the tr
elements of any table
element that does not have a table
as an ancestor (i.e. the outermost tables, only) and that may or may not have a tbody
element:
//table[not(ancestor::table)]/tbody/tr|//table[not(ancestor::table)]/tr
This is the union of two separate expressions, one that selects the correct element when tbody
is present and another for when it is not.
Upvotes: 2