bqui56
bqui56

Reputation: 2121

Only get parent rows of nested tables

I have a document I am parsing with Jsoup which has a structure like:

  <body>
      <table cellspacing="0">
         <tr>
            <td>one</td>
         </tr>
         <tr>
            <td>two</td>
         </tr>
         <tr>
            <td>
               <table cellspacing="0">
                  <tr>
                     <td>inner one</td>
                     <td>inner two</td>
                  </tr>
                  <tr>
                     <td>inner three</td>
                     <td>inner four</td>
                  </tr>
               </table>
            </td>
         </tr>
      </table>
   </body>

There are no id's or anything to disambiguate the inner/outer tables on the page.

I want to loop through each outer rows that do not have a table inside them. Currently I have:

Elements rows = document.select("tr");
for (Element row : rows) {
...
}

But of course I am getting the row with the table as well as the rows in the inner table so I can't just check if curr row contains a table and continue in the loop.

How can I get rows 1 and two from the main table only and skip row 3 and its inner rows?

Upvotes: 0

Views: 1190

Answers (1)

jama
jama

Reputation: 325

This isn't the most elegant solution, but it worked for me:

Elements rows = document.select("body > table > tbody > tr:not(:has(table))");
for(Element row : rows){
...
}

What's really odd is that I copied your HTML and still had to use the tbody selector. if I just did Elements rows = document.select("body > table > tr:not(:has(table))"); it wouldn't catch anything.

Printing out the results I got:

<tr> 
 <td>one</td> 
</tr>
<tr> 
 <td>two</td> 
</tr>

Upvotes: 2

Related Questions