Reputation: 563
I am having problems implementing something that equates a do while loop.
PROBLEM DESCRIPTION
I am scraping a site and the results pages are paginated, i.e.
1, 2, 3, 4, 5, .... NEXT
I am iterating through the pages using a test condition for the existence of the NEXT
link. If there is one results page, There is no NEXT
link so I will just scrape that first page. If there is more than one page, the last page also has no NEXT
link. So the scraper function would also work on that page. The scraping function is called findRecords()
So I am isolating my NEXT
link using:
next_link = driver.find_element(By.XPATH, "//a[contains(text(),'Next')][@style='text-decoration:underline; cursor: pointer;']")
So I want to run a loop that performs the scrape at least once (when there is one or more results page). I am also clicking the NEXT
button using a click() function. The code I have so far is:
while True:
findRecords()
next_link = driver.find_element(By.XPATH, "//a[contains(text(),'Next')][@style='text-decoration:underline; cursor: pointer;']")
if not next_link:
break
next_link.click()
This is not working. Well, it works and it scrapes but when it reaches the last page it give me a NoSuchElementException
as follows:
Traceback (most recent call last): File "try.py", line 47, in next_link = driver.find_element(By.XPATH, "//a[contains(text(),'Next')][@style='text-decoration:underline; cursor: pointer;']") File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 752, in find_element 'value': value})['value'] File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 236, in execute self.error_handler.check_response(response) File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/errorhandler.py", line 192, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//a[contains(text(),'Next')][@style='text-decoration:underline; cursor: pointer;']"} (Session info: chrome=53.0.2785.89) (Driver info: chromedriver=2.20.353124 (035346203162d32c80f1dce587c8154a1efa0c3b),platform=Linux 3.13.0-92-generic x86_64)
I know it's true that the element does not exist on that last page, because like i said before, the NEXT
element does not exist on the last page.
So how do i fix my while loop to be able to scrape a single page result and/or that last page when the condition is not true and also elegantly break out of the while loop without giving me that hideous error?
PS: Other than the while loop above, I have also tried the following:
is_continue = True
while is_continue:
findRecords()
next_link = driver.find_element(By.XPATH, "//a[contains(text(),'Next')][@style='text-decoration:underline; cursor: pointer;']")
if next_link:
is_continue = True
next_link.click()
else:
is_continue = False
And if it is any help, here is my scraper function findRecords()
as well:
def findRecords():
filename = "sam_" + letter + ".csv"
bsObj = BeautifulSoup(driver.page_source, "html.parser")
tableList = bsObj.find_all("table", {"class":"width100 menu_header_top_emr"})
tdList = bsObj.find_all("td", {"class":"menu_header width100"})
for table,td in zip(tableList,tdList):
a = table.find_all("span", {"class":"results_body_text"})
b = td.find_all("span", {"class":"results_body_text"})
with open(filename, "a") as csv_file:
csv_file.write(', '.join(tag.get_text().strip() for tag in a+b) +'\n')
Upvotes: 0
Views: 79
Reputation: 23815
You should try using find_elements
, it would return either list of WebElement or empty list. So just check its length as below :-
while True:
findRecords()
next_link = driver.find_elements(By.XPATH, "//a[contains(text(),'Next')][@style='text-decoration:underline; cursor: pointer;']")
if len(next_link) == 0:
break
next_link[0].click()
Upvotes: 1
Reputation: 9058
When you are searching for next link change code to find_elements which will return a list of size 1 if Next is present else list of size 0 but no exception.
next_link = driver.find_elements(By.XPATH, "//a[contains(text(),'Next')][@style='text-decoration:underline; cursor: pointer;']")
You need to put in place logic to access the Next webelement from this list now.
Upvotes: 2