Reputation: 11
I am trying to scrape using Selenium and XPath in Python, to get the "SIRET" row from the table. I have tried different types of XPaths, but I couldn't do it. One problem is that the " class="reportRow" " element is changing dynamically and it can't be scrapped after the position number. The "SIRET" raw and his "td class" subelements values, can be scrapped after the "SIRET" text or in some other way?
This are the manual steps that I am doing when I acces the site:
The site contain only the root domain. After I acces the site thru login data, I enter an search criteria, which open an page where I have to click on an link which open an popup window whith an table. The table contain 4 rows and 8 columns, the first row contains the name of the colums, and the other 3 rows contain data as the the "SIRET" one. The position of that 3 rows is changing regularly, depending on the data that is recievd from an specific server. That is why I want to scarpe that row and his values by the "SIRET" text.
My final scraped data should look like this: SIRET 646 90 0.2% $2.94 1.03 0.07 4.52.
Thank you very much for your inputs.
<div class="table_container">
<table>
<tbody>
<tr class="reportHead">.....</tr></tbody>
<tbody>
<tr class="reportRow ">....</tr>
<tr class="reportRow ">....</tr>
<tr class="reportRow ">
<td data-actual="SIRET" class="reportKeyword">SIRET</td>
<td class="td2">646</td>
<td class="td1">90</td>
<td class="rcr">0.2%</td>
<td class="td1">$2.94</td>
<td class="td1">1.03</td>
<td class="td1">0.07</td>
<td class="td1 rctl">4.52</td>
</tr>
</tbody>
<tfoot style="display: none;">....</tfoot>
</table>
Upvotes: 1
Views: 1554
Reputation: 185
Strange. As a matter of fact, the solution is not as intricate:
driver.find_element_by_xpath("//td[@data-actual='SIRET']/../td")
Upvotes: 0
Reputation: 193338
If I have understood the question correctly, you are trying to get the string "SIRET"
from the <td>
node which changes dynamically. To do that you can use the following line of code :
print(driver.find_element_by_xpath("//td[@class='reportKeyword']").get_attribute("innerHTML"))
Upvotes: 0
Reputation: 4749
You can use xpath like this
SIRET= driver.find_element_by_xpath("//td[@data-actual='SIRET']")
Then you can use .text
operation to get text
if data is dyanmically change then you have to use
SIRET= driver.find_element_by_xpath("//td[@class='reportKeyword']")
Upvotes: 2