Reputation: 5355
I am trying to scrape with selenium a table of products.
Here is my example table:
<div class="article">
<table style="width: 100%">
<tbody><tr>
<td class="trenner_u"></td>
<td class="trenner_u">
<a href="/details/12900101" class="changeable">
<span>Product 1 </span>
</a>
</td>
<td class="trenner_lu">
11.11.1999
</td>
<td class="trenner_lu">
<a title="Category Product Group" href="/grp/detailsSmallTB_iframe=true&height=132&width=420" class="thickbox">Group 1</a>
</td>
<td class="trenner_lu">
1999$
</td>
</tr>
<tr>
<td class="trenner_u"></td>
<td class="trenner_u">
<a href="/details/12900347" class="changeable">
<span>Product 2 </span>
</a>
</td>
<td class="trenner_lu">
1.12.1944
</td>
<td class="trenner_lu">
<a title="Category Product Group" href="/grp/detailsSmallTB_iframe=true&height=132&width=420" class="thickbox">Group 2</a>
</td>
<td class="trenner_lu">
1234$
</td>
</tr>
<tr>
<td class="trenner_u"></td>
<td class="trenner_u">
<a href="/details/12908635" class="changeable">
<img class="positionable" src="/ImageImage/12908635" alt="" style="width: 100px; opacity: 0.9;">
<span>Product 1 </span>
<img src="/Content/images/icons/photo.png" alt="Foto">
</a>
</td>
<td class="trenner_lu">
05.12.1950
</td>
<td class="trenner_lu">
<a title="Category Product Group" href="/grp/detailsSmallTB_iframe=true&height=132&width=420" class="thickbox">Group 2</a>
,<a title="Category Product Group" href="/grp/detailsSmallTB_iframe=true&height=132&width=420" class="thickbox">Group 4</a>
</td>
<td class="trenner_lu">
131282$
</td>
</tr>
</tbody></table>
</div>
I tried to scrape each element with:
List<WebElement> links = driver.findElements(By.xpath("//*[@id=\"home\"]/div[3]/table/tbody/tr/td[2]/a"));
List<WebElement> prodNames = driver.findElements(By.xpath("//*[@id=\"home\"]/div[3]/table/tbody/tr/td[2]/a"));
List<WebElement> group = driver.findElements(By.xpath("//*[@id=\"home\"]/div[3]/table/tbody/tr/td[4]/a"));
However, as you can see one of my td
elements has two links inside, therefore my WebElement list has not the same length and it is extremely hard to merge together.
My desired list output should look like that:
[Product 1, 11.11.1999, Group 1, 1999$], [Product 2, 1.12.1944,Group 2, 1234$], [Product 1, 05.12.1950, Group 2 Group 2, 131282$]
Any suggestion how to scrape such a table much more efficient?
I appreciate your replies!
Upvotes: 0
Views: 649
Reputation: 805
You could probably iterate through each row to make it clearer as to what you are doing in python it would be:
rows = driver.find_elements(By.XPATH, "//*[@id=\"home\"]/div[3]/table/tbody/tr")
for row in rows:
cells = row.find_elements(By.XPATH, "//td")
product_name = cells[1].text
... etc ...
Upvotes: 1
Reputation:
Think about everything you interact with as of objects:
class Table {
private static final String TABLE_CELL = "//table/tbody/tr[%d]/td[%d]";
public String getTableCellText(int row, int col) {
WebElement cell = driver.findElement(By.xpath(String.format(TABLE_CELL, row, col)));
return cell.getText();
}
}
You can use it as you see fit:
Table t = new Table();
System.out.println(t.getTableCellText(3, 5)); // prints 131282$
Upvotes: 1