Carol.Kar
Carol.Kar

Reputation: 5355

Scraping all cell values as strings from a table

I am trying to get some data from a member details page from a page I built some time ago.

However, not all these pages look like the same. They are basically build in the background creating tables and if data exists then the table is added, if not then the table is not added.

Furthermore, the tables do not have a fixed length and can change if certain fields do not exist.

Such a tables body looks like that:

    <tbody><tr>
      <td style="width: 115px; vertical-align: top;">
        <img src="/Image/1231" alt="" style="width:100px;"><br>
        Hamburg<br>
        <br>
      </td>
      <td class="trenner_l" style="vertical-align: text-top;">
        <table style="width: 100%;">
          <tbody><tr>
            <td colspan="4" class="trenner_u"></td>
          </tr>
            <tr style="height: 8px;">
              <td style="vertical-align: middle;">
                  <img src="/Content/images/floasdfh_ain.png" title="memb" height="16">
&nbsp;
              </td>
              <td style="vertical-align: top;">
                vlg.&nbsp;minao
              </td>
              <td class="trenner_l">
                <a href="/memb/DetailSmall/daTB_iframe=true&amp;height=132&amp;width=420" class="thickbox" >
                  Cate1</a> (21.03.1928)
              </td>
              <td class="trenner_l" style="vertical-align: top;">
                UP,&nbsp;FORUM
              </td>
            </tr>
            <tr style="height: 8px;">
              <td style="vertical-align: middle;">
&nbsp;
              </td>
              <td style="vertical-align: top;">
                name.&nbsp;minao
              </td>
              <td class="trenner_l">
                <a href="/Verband/DetailSmall/jhkg?TB_iframe=true&amp;height=132&amp;width=420" class="thickbox" >Zone
                  1</a> 
              </td>
              <td class="trenner_l" style="vertical-align: top;">
                Z1,&nbsp;CV
              </td>
            </tr>
            <tr style="height: 8px;">
              <td style="vertical-align: middle;">
&nbsp;
              </td>
              <td style="vertical-align: top;">
                vlg.&nbsp;meno
              </td>
              <td class="trenner_l">
                <a href="/Verband/DetailSmall/asdfasd?TB_iframe=true&amp;height=132&amp;width=420" class="thickbox" >K.D.St.V.
                  Zone2</a> 
              </td>
              <td class="trenner_l" style="vertical-align: top;">
                Z1,&nbsp;Forum
              </td>
            </tr>

          <tr>
            <td colspan="4" class="trenner_o"></td>
          </tr>
            <tr>
              <td colspan="2">
                Mobiltelefon privat:&nbsp;
              </td>
              <td colspan="2" class="trenner_l">
                <a href="tel:+22341123124">+22341123124</a>
              </td>
            </tr>
            <tr>
              <td colspan="4" class="trenner_o"></td>
            </tr>
            <tr>
              <td colspan="2">email:
              </td>
              <td colspan="2" class="trenner_l">
                <a href="mailto:[email protected]">[email protected]</a>
              </td>
            </tr>
            <tr>
              <td colspan="4" class="trenner_o"></td>
            </tr>
            <tr>
              <td>
                <img src="/Content/images/icons/map.png">
              </td>
              <td style="vertical-align: top;">
                adress:&nbsp;
              </td>
              <td colspan="2" class="trenner_l" style="vertical-align: top;">
                Teststreet 2, 243423&nbsp;City, State 


              </td>
            </tr>
        </tbody></table>


        <br>
          <div class="TextSmall">online 12.04.2013</div>
      </td>
    </tr>
  </tbody>

As I only need the data that is available, my idea is to get all the string information in such a table.

I tried the following:

    for (int j = 0; j < list.size(); j++) {
        String link = list.get(i).getLinkToGVPage();
        openSite(link);
        //  
        List<WebElement> adresse = driver.findElements(By.xpath("//*[@id=\"ui-id-4\"]/table/tbody/tr/td"));
        for (int k = 0; k < adresse.size(); k++) {
            System.out.println(adresse.get(k).getText());
        }

                    WebElement adresse = driver.findElement(By.xpath("//*[@id=\"ui-id-4\"]/table/tbody/tr[6]/td"));
                    System.out.println(adresse.getText());
                } catch(Exception e) {
                    System.out.println("exceptions");
                    e.printStackTrace();
        try {
            Thread.sleep(2000);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
    }

However, I get nothing back. Any suggestions how to get only the string values from the table back, so that I can save them in a string variable.

I appreciate your replies!

Upvotes: 2

Views: 47

Answers (1)

Cronax
Cronax

Reputation: 355

I personally use Python rather than Java, but the general principle seems the same to me: I would check for every string separately using an if/elseif construction and then save the ones you find to a variable.

The problem, looking at your table, is that these fields seem to have no unique identifiers, meaning it will be very hard to correctly identify them. If you are able to adapt the code that generates the page (or have someone do this for you), I would give each type of cell that you want to be able to detect its own class.

Upvotes: 1

Related Questions