user2495518
user2495518

Reputation: 1

Trying to find XPath for multiple TDs

I want to extract the Address for specific Numbers (the first TD) of this table. The only unique identifier for the table is the H3.

Here is the code for the table:

<table width="95%" cellpadding=5 cellspacing=0 border=1>
    <tr><td colspan="4"><h3>The list</td></tr>
    <tr>
        <td>Number</td><td>First Name</td>
        <td>Last Name</td><td>Address</td>
   </tr>

I have tried:

//table[@h3=’See this now’]/’tr/td[87] and td[107] and td[116]

I am new to xpath, and programming in general. It's pretty fun, but would love to be able to figure this one out!! Appreciate any help :D

Upvotes: 0

Views: 525

Answers (1)

Patrick Magee
Patrick Magee

Reputation: 2989

First, your HTML is wrong.

  • You did not close your Table element.
  • You did not close your H3 element.
  • You must enclose your attributes in quotes.

     <table width="95%" cellpadding="5" cellspacing="0" border="1"> 
       <tr> 
         <td colspan="4"> 
           <h3>The list</h3> 
         </td> 
       </tr>
       <tr> 
         <td>Number</td> 
         <td>First Name</td>  
         <td>Last Name</td> 
         <td>Address</td>
      </tr>
    </table>
    

Once you have fixed the formatting of your XHTML. You can traverse the document tree.

XPATH

Any table, with any td that has a h3.

//table//td/h3

Will return

<h3>The list</h3>

For the number

//table//tr[2]/td[1]    <-- any table, the second tr element in this table, the first td in that second tr

Will return

<td>Number</td>

So if we add multiple tables to a document and you want to find multiple results for each element in any table, this is quite simple. Say we have a XHTML document with many tables inside a parent element, for example 'root' element.

<root>
    <table width="95%" cellpadding="5" cellspacing="0" border="1">
        <tr>
            <td colspan="4">
                <h3>The list</h3>
            </td>
        </tr>
        <tr>
            <td>123</td>
            <td>First Name</td>
            <td>Last Name</td>
            <td>Address</td>
        </tr>
    </table>
    <table width="95%" cellpadding="5" cellspacing="0" border="1">
        <tr>
            <td colspan="4">
                <h3>The list</h3>
            </td>
        </tr>
        <tr>
            <td>456</td>
            <td>First Name</td>
            <td>Last Name</td>
            <td>Address</td>
        </tr>
    </table>
    <table width="95%" cellpadding="5" cellspacing="0" border="1">
        <tr>
            <td colspan="4">
                <h3>The list</h3>
            </td>
        </tr>
        <tr>
            <td>789</td>
            <td>First Name</td>
            <td>Last Name</td>
            <td>Address</td>
        </tr>
    </table>
</root>

We can extract the number of the first table data in each second row in every table using the following XPATH expression:

//table/tr[2]/td[1]

This will give us the result of

<td>123</td>
-----------------------
<td>456</td>
-----------------------
<td>789</td>

Now, say we have several tables, but only one table is very important to us, the table must have a H3 element, no other element is important to us, and if this table has a H3 element, we want to extract the second rows first td.

<root>
    <table width="95%" cellpadding="5" cellspacing="0" border="1">
        <tr>
            <td colspan="4">
                <h4>Ignore me!</h4>
            </td>
        </tr>
        <tr>
            <td>1164961564896</td>
            <td>First Name</td>
            <td>Last Name</td>
            <td>Address</td>
        </tr>
    </table>
    <table width="95%" cellpadding="5" cellspacing="0" border="1">
        <tr>
            <td colspan="4">
                <h1>I'm not interesting</h1>
            </td>
        </tr>
        <tr>
            <td>456456466465</td>
            <td>First Name</td>
            <td>Last Name</td>
            <td>Address</td>
        </tr>
    </table>
    <table width="95%" cellpadding="5" cellspacing="0" border="1">
        <tr>
            <td colspan="4">
                <h3>IM THE IMPORTANT TABLE!</h3>
            </td>
        </tr>
        <tr>
            <td>123456789</td>
            <td>First Name</td>
            <td>Last Name</td>
            <td>Address</td>
        </tr>
    </table>
</root>

We can acomplish this by traversing back up the tree if we are successful in finding the H3 element, then go to the next tr.

//table//h3/../../../tr/td[1]

Will return

<td colspan="4">
<h3>IM THE IMPORTANT TABLE!</h3>
</td>
-----------------------
<td>123456789</td>

Upvotes: 1

Related Questions