Carol.Kar
Carol.Kar

Reputation: 5355

Scrape td attribute rows with selenium

I am trying to scrape with selenium a table of products.

Here is my example table:

<div class="article">
  <table style="width: 100%">
        <tbody><tr>
          <td class="trenner_u"></td>
          <td class="trenner_u">
            <a href="/details/12900101" class="changeable">
              <span>Product 1 </span>
            </a>
          </td>
          <td class="trenner_lu">
            11.11.1999
          </td>
          <td class="trenner_lu">
            <a title="Category Product Group" href="/grp/detailsSmallTB_iframe=true&amp;height=132&amp;width=420" class="thickbox">Group 1</a>
          </td>
          <td class="trenner_lu">
1999$
          </td>
        </tr>
        <tr>
          <td class="trenner_u"></td>
          <td class="trenner_u">
            <a href="/details/12900347" class="changeable">
              <span>Product 2 </span>
            </a>
          </td>
          <td class="trenner_lu">
            1.12.1944
          </td>
          <td class="trenner_lu">
            <a title="Category Product Group" href="/grp/detailsSmallTB_iframe=true&amp;height=132&amp;width=420" class="thickbox">Group 2</a>
          </td>
          <td class="trenner_lu">
1234$
          </td>
        </tr>
        <tr>
          <td class="trenner_u"></td>
          <td class="trenner_u">
            <a href="/details/12908635" class="changeable">
                <img class="positionable" src="/ImageImage/12908635" alt="" style="width: 100px; opacity: 0.9;">  
              <span>Product 1 </span>
  <img src="/Content/images/icons/photo.png" alt="Foto">
            </a>
          </td>
          <td class="trenner_lu">
            05.12.1950
          </td>
          <td class="trenner_lu">
                        <a title="Category Product Group" href="/grp/detailsSmallTB_iframe=true&amp;height=132&amp;width=420" class="thickbox">Group 2</a>
,<a title="Category Product Group" href="/grp/detailsSmallTB_iframe=true&amp;height=132&amp;width=420" class="thickbox">Group 4</a>

          </td>
          <td class="trenner_lu">
131282$
          </td>
        </tr>
        
  </tbody></table>
</div>

I tried to scrape each element with:

    List<WebElement> links = driver.findElements(By.xpath("//*[@id=\"home\"]/div[3]/table/tbody/tr/td[2]/a"));
    List<WebElement> prodNames = driver.findElements(By.xpath("//*[@id=\"home\"]/div[3]/table/tbody/tr/td[2]/a"));
    List<WebElement> group = driver.findElements(By.xpath("//*[@id=\"home\"]/div[3]/table/tbody/tr/td[4]/a"));
    

However, as you can see one of my td elements has two links inside, therefore my WebElement list has not the same length and it is extremely hard to merge together.

My desired list output should look like that:

[Product 1, 11.11.1999, Group 1, 1999$], [Product 2, 1.12.1944,Group 2, 1234$], [Product 1, 05.12.1950, Group 2 Group 2, 131282$]

Any suggestion how to scrape such a table much more efficient?

I appreciate your replies!

Upvotes: 0

Views: 649

Answers (2)

karina
karina

Reputation: 805

You could probably iterate through each row to make it clearer as to what you are doing in python it would be:

rows = driver.find_elements(By.XPATH, "//*[@id=\"home\"]/div[3]/table/tbody/tr")
for row in rows:
    cells = row.find_elements(By.XPATH, "//td")
    product_name = cells[1].text
    ... etc ...

Upvotes: 1

user4925383
user4925383

Reputation:

Think about everything you interact with as of objects:

class Table {
    private static final String TABLE_CELL = "//table/tbody/tr[%d]/td[%d]";

    public String getTableCellText(int row, int col) {
        WebElement cell = driver.findElement(By.xpath(String.format(TABLE_CELL, row, col)));
        return cell.getText();
    }
}

You can use it as you see fit:

    Table t = new Table();
    System.out.println(t.getTableCellText(3, 5)); // prints 131282$

Upvotes: 1

Related Questions