Tendekai Muchenje
Tendekai Muchenje

Reputation: 563

Implementing a modified do-while loop in Python i.e. do at least once and another time at the end of the loop?

I am having problems implementing something that equates a do while loop.

PROBLEM DESCRIPTION

I am scraping a site and the results pages are paginated, i.e.

1, 2, 3, 4, 5, .... NEXT

I am iterating through the pages using a test condition for the existence of the NEXT link. If there is one results page, There is no NEXT link so I will just scrape that first page. If there is more than one page, the last page also has no NEXT link. So the scraper function would also work on that page. The scraping function is called findRecords()

So I am isolating my NEXT link using:

next_link = driver.find_element(By.XPATH, "//a[contains(text(),'Next')][@style='text-decoration:underline; cursor: pointer;']")

So I want to run a loop that performs the scrape at least once (when there is one or more results page). I am also clicking the NEXT button using a click() function. The code I have so far is:

while True:
    findRecords()
    next_link = driver.find_element(By.XPATH, "//a[contains(text(),'Next')][@style='text-decoration:underline; cursor: pointer;']")
    if not next_link:
        break
    next_link.click()

This is not working. Well, it works and it scrapes but when it reaches the last page it give me a NoSuchElementException as follows:

Traceback (most recent call last): File "try.py", line 47, in next_link = driver.find_element(By.XPATH, "//a[contains(text(),'Next')][@style='text-decoration:underline; cursor: pointer;']") File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 752, in find_element 'value': value})['value'] File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 236, in execute self.error_handler.check_response(response) File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/errorhandler.py", line 192, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//a[contains(text(),'Next')][@style='text-decoration:underline; cursor: pointer;']"} (Session info: chrome=53.0.2785.89) (Driver info: chromedriver=2.20.353124 (035346203162d32c80f1dce587c8154a1efa0c3b),platform=Linux 3.13.0-92-generic x86_64)

I know it's true that the element does not exist on that last page, because like i said before, the NEXT element does not exist on the last page.

So how do i fix my while loop to be able to scrape a single page result and/or that last page when the condition is not true and also elegantly break out of the while loop without giving me that hideous error?

PS: Other than the while loop above, I have also tried the following:

is_continue = True
while is_continue:
    findRecords()
    next_link = driver.find_element(By.XPATH, "//a[contains(text(),'Next')][@style='text-decoration:underline; cursor: pointer;']")
    if next_link:
        is_continue = True
        next_link.click()
    else:
        is_continue = False 

And if it is any help, here is my scraper function findRecords() as well:

def findRecords():
    filename = "sam_" + letter + ".csv"
    bsObj = BeautifulSoup(driver.page_source, "html.parser")
    tableList = bsObj.find_all("table", {"class":"width100 menu_header_top_emr"}) 
    tdList = bsObj.find_all("td", {"class":"menu_header width100"})

    for table,td in zip(tableList,tdList):
            a = table.find_all("span", {"class":"results_body_text"})
            b = td.find_all("span", {"class":"results_body_text"})
            with open(filename, "a") as csv_file:
                csv_file.write(', '.join(tag.get_text().strip() for tag in a+b) +'\n')

Upvotes: 0

Views: 79

Answers (2)

Saurabh Gaur
Saurabh Gaur

Reputation: 23815

You should try using find_elements, it would return either list of WebElement or empty list. So just check its length as below :-

while True:
    findRecords()
    next_link = driver.find_elements(By.XPATH, "//a[contains(text(),'Next')][@style='text-decoration:underline; cursor: pointer;']")
    if len(next_link) == 0:
        break
    next_link[0].click()

Upvotes: 1

Grasshopper
Grasshopper

Reputation: 9058

When you are searching for next link change code to find_elements which will return a list of size 1 if Next is present else list of size 0 but no exception.

next_link = driver.find_elements(By.XPATH, "//a[contains(text(),'Next')][@style='text-decoration:underline; cursor: pointer;']")

You need to put in place logic to access the Next webelement from this list now.

Upvotes: 2

Related Questions