CSlater
CSlater

Reputation: 73

How to click "Next" button until it no longer exists - Python, Selenium, Requests

I am scraping data from a webpage that is paginated, and once I finish scraping one page, I need to click the next button and continue scraping the next page. I then need to stop once I have scraped all of the pages and a next button no longer exists. Below contains the html around the "Next" button that I need to click.

<tr align="center"> 
   <td colspan="8" bgcolor="#FFFFFF">
     <br> 
     <span class="paging">
       <b> -- Page 1 of 3 -- </b>
     </span>
     <p>
       <span class="paging"> 
         <a href="page=100155&amp;by=state&amp;state=AL&amp;pagenum=2"> .          
           <b>Next -&gt;</b>
         </a> 
           &nbsp;&nbsp;
       </span> 
       <span class="paging"> 
         <a href=" page=100155&amp;by=state&amp;state=AL&amp;pagenum=3">Last -&gt;&gt;</a> 
       </span>
     </p>
   </td>
</tr>

I have tried selecting on class and on link text, and both have not worked for me in my current attempts.

2 examples of my code:

while True:
    try:
        link = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.LINK_TEXT, "Next ->"))).click()
    except TimeoutException:
        break

while True:
        try:
            link = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CLASS_NAME, "paging"))).click()
        except TimeoutException:
            break

All of the solutions I have found online have not worked, and have primarily ended with the following error:

ElementClickInterceptedException: Message: element click 
intercepted: Element <a href="? 
page=100155&amp;by=state&amp;state=AL&amp;pagenum=2">...</a> is not 
clickable at point (119, 840). Other element would receive the 
click: <body class="custom-background hfeed" style="position: 
relative; min-height: 100%; top: 0px;">...</body>
(Session info: chrome=76.0.3809.132)

If the remainder of the error code would be helpful to review, please let me know and I will update the post with this error.

I have looked at the following resources, all to no avail:

Python Selenium clicking next button until the end

python - How to click "next" in Selenium until it's no longer available?

Python Selenium Click Next Button

Python Selenium clicking next button until the end

Selenium clicking next button programmatically until the last page

How can I make Selenium click on the "Next" button until it is no longer possible?

Could anyone provide suggestions on how I can select the "Next" button (if it exists) and go to the next page with this set of HTML? Please let me know if you need any further clarification on the request.

Upvotes: 3

Views: 1702

Answers (2)

Gidoneli
Gidoneli

Reputation: 113

We can approach this problem through the solution using two major libraries - selenium and requests.

Approach - Scrape the page for page number and next page link every time

Using Selenium (If the site is Dynamic)

We can check if the page we are on is the last page or not, and if it is not the last page, we can check for the next button (assuming the website follows the same html structure for paging in all pages)

stop = False
driver.get(url)
while not stop:
    paging_elements = driver.find_elements_by_class_name("paging")
    page_numbers = paging_elements[0].text.strip(" -- ").split("of")

    ## Getting the current page number and the final page number

    final = int(page_numbers[1].strip())
    current = int(page_numbers[0].split("Page")[-1].strip())

    if current==final:
        stop=True
    else:
        next_page_link = paging_elements[-2].find_element_by_name("a").get_attribute('href')
        driver.get(next_page_link)
        time.sleep(5) # This gap can be changed as per the load time of the page

Using Requests and BS4 (If the site is static)

import requests

r = requests.get(url)
stop = False
while not stop:
    soup = BeautifulSoup(r.text, 'html.parser')

    paging_elements = soup.find_all('span', attrs={'class': "paging"})
    page_numbers = paging_elements[0].text.strip(" -- ").split("of")

    ## Getting the current page number and the final page number

    final = int(page_numbers[1].strip())
    current = int(page_numbers[0].split("Page")[-1].strip())

    if current==final:
        stop=True
    else:
        next_page_link = paging_elements[-2].find("a").get('href')
        r = request.get(next_page_link)

Alternative approaches

One method is using the URL of the website itself instead of the button-clicking process as the button click is intercepted in this case.

Most web pages have a page attribute added to their URL (visible for pages >=2). So, a paginated website might have URLs such as:

www.targetwebsite.com/category?page_num=1

www.targetwebsite.com/category?page_num=2

www.targetwebsite.com/category?page_num=3

and so on.

In such cases, one can simply iterate over the page numbers until the final page number (as originally out in the proposed answer). This approach eliminates the breakage possibility of the target website changing CSS layout/style.

Furthermore, there might be a requirement to create the next_page_link by appending the base URL as done for next_url in the other question (line 40-41):

next_url = next_link.find("a").get("href")

r = session.get("https://reverb.com/marketplace" + next_url)

I hope this helps!

Upvotes: 2

CEH
CEH

Reputation: 5909

It sounds like you're asking two different questions here:

  1. How to click Next button until it no longer exists
  2. How to click Next button with Javascript.

Here's a solution to #2 -- Javascript clicking:

        public static void ExecuteJavaScriptClickButton(this IWebDriver driver, IWebElement element)  
        {
            ((IJavaScriptExecutor) driver).ExecuteScript("arguments[0].click();", element);
        }

In the above code, you have to cast your WebDriver instance as IJavascriptExecutor, which allows you to run JS code through Selenium. The parameter element is the element you wish to click -- in this case, the Next button.

Based on your code sample, your Javascript click may look something like this:

var nextButton = driver.findElement(By.LINK_TEXT, "Next ->"));
driver.ExecuteJavascriptClickButton(nextButton);

Now, moving onto your other issue -- clicking until the button is no longer visible. I would implement this in a while loop that breaks whenever the Next button no longer exists. I also recommend implementing a function that can check the presence of the Next button, and ignore the ElementNotFound or NoSuchElement exception in case the button does not exist, to avoid breaking your test. Here's a sample that includes an ElementExists implementation:


public bool ElementExists(this IWebDriver driver, By by)
{
    // attempt to find the element -- return true if we find it
    try 
    {
        return driver.findElements(by).Count > 0;
    }

    // catch exception where we did not find the element -- return false
    catch (Exception e)
    {
        return false;
    }
}

public void ClickNextUntilInvisible()
{
    while (driver.ElementExists(By.LINK_TEXT, "Next ->"))
    {

        // find next button inside while loop so it does not go stale
        var nextButton = driver.findElement(By.LINK_TEXT, "Next ->"));

        // click next button using javascript
        driver.ExecuteJavascriptClickButton(nextButton);
    }
}

This while loop checks for the presence of the Next button with each iteration. If the button does not exist, the loop breaks. Inside the loop, we call driver.findElement with each successive click, so that we do not get a StaleElementReferenceException.

Hope this helps.

Upvotes: 0

Related Questions