Anupama
Anupama

Reputation: 167

How to get the rows of a nested HTML table using XSLT

I am trying to get the table rows from a XHTML using XPath / XSLT. My sample xhtml looks like this :

<body>
<....>
   <table>
     <tbody>
       <tr>
         <td/>
         <td/>
         <td>
            <table>
              <tr>
                <....>
              </tr>
            </table>
         </td>
       </tr>
     </tbody>
   </table>
</body>

In the above structure, <tbody> may or may not be there. Tables could be nested to any level. Now I want get all the rows for a given table. So when I am processing the outer table, I want to get only the outer row (one that contains 3 tds) but not the inner tr (inside the nested table). How can I do this using XSLT or XPath?

Edit : What I am essentially looking for is a way of getting all descendant::y for a node x, but y should not be a descendant of another x. The path from x->y should not contain another x. I may not have anything that distinguishes the outer x from the inner x.

Note : I am trying to do this with many HTMLs which all have different structures, and I cannot change the structure of any HTML file - it is given to me. The only thing is that they are all well formed XHTML.

Thanks for your help.

Upvotes: 4

Views: 782

Answers (2)

Dimitre Novatchev
Dimitre Novatchev

Reputation: 243449

What I am essentially looking for is a way of getting all descendant::y for a node x, but y should not be a descendant of another x.

Suppose $n is the element named x. You want:

$n//y[count(ancestor::x) = count($n/ancestor-or-self::x)]

This selects all y that are descendents of $n and that have such a number of x ancestors that is exactly one greater than the number of ancestor::x of $n.

Because $n contains an x element, this means that for all selected y the x contained in $n is their first ancestor::x .

For your practical purposes, you only have to substitute $n above with the exact XPath expression that selects the x element it contains.

Upvotes: 0

Wayne
Wayne

Reputation: 60414

The following expression selects the tr elements of any table element that does not have a table as an ancestor (i.e. the outermost tables, only) and that may or may not have a tbody element:

//table[not(ancestor::table)]/tbody/tr|//table[not(ancestor::table)]/tr

This is the union of two separate expressions, one that selects the correct element when tbody is present and another for when it is not.

Upvotes: 2

Related Questions