Reputation: 439
Here is what the table looks like on the web page (it's just one column):
Here is the HTML of the table I am trying to scrape:
If it matters, that table is nested within another table.
Here is my code:
def filter_changed_records():
# Scrape webpage for addresses from table of changed properties
row_number = 0
results_frame = locate_element(
'//*[@id="oGridFrame"]'
)
driver.switch_to.frame(results_frame)
while True:
try:
address = locate_element("id('row" + str(row_number) +
"FC')/x:td")
print(address)
changed_addresses.append(address)
row_number += 1
except:
print("No more addresses to add.")
break
As you can see, there is a <tr>
tag with an id of row0FC
. This table is dynamically generated, and each new <tr>
gets an id with a increasing number: row0FC, row1FC, row2FC
etc. That is how I planned on iterating through all the entries and adding them to a list.
My locate_element function is the following:
def locate_element(path):
element = WebDriverWait(driver, 50).until(
EC.presence_of_element_located((By.XPATH, path)))
return element
It always times out after 50 seconds from not finding the element. Unsure of how to proceed. Is there a better way of locating the element?
SOLUTION BY ANDERSSON
address = locate_element("//tr[@id='row%sFC']/td" % row_number).text
Upvotes: 3
Views: 1752
Reputation: 52665
Your XPath
seem to be incorrect.
Try below:
address = locate_element("//tr[@id='row%sFC']/td" % row_number)
Also note that address
is a WebElement
. If you want to get its text content, you should use
address = locate_element("//tr[@id='row%sFC']/td" % row_number).text
Upvotes: 3
Reputation: 740
Parsing html with selenium is slow. I would use BeautifulSoup for that.
Suppose you have loaded the page in driver it would be something like:
from bs4 import BeautifulSoup
....
soup = BeautifulSoup(driver.page_source, "html.parser")
td_list = soup.findAll('td')
for td in td_list:
try:
addr = td['title']
print(addr)
except:
pass
Upvotes: -1