Navigate pagination with Selenium Webdriver

Question

I am trying to navigate a list of links that looks like this:

from a series of webpages like this one and then retrieve the links on each one. This is the HTML:

I managed to find the number of pages, so I am trying to iterate over that. My idea was to select the active element, and the click() on the next one. Being unfamiliar with Xpath I am stumbling on how to do that.

This is the code I am using:

driver.find_element_by_xpath("//li[class='active']/a//following").click()

Any help would be appreciated.

Matteo Moreschini · Accepted Answer

Since the urls are in the format BASE_URL+page=NUM_PAGE, you could simply get the maximum page number (7 in your case).

In that way you can build all the urls with something like:

BASE_URL = "https://dati.comune.milano.it/dataset?groups=heal"
urls = []
for page_num in range(1, MAX_PAGES):
    urls.append(f"{BASE_URL}&page={page_num}")

In this way you'll have all the pages without having to click anything, just knowing the maximum number of pages which you can easily find as you already did.

SELENIUM SOLUTION

There are probably one thousand ways and more to do this in a cleaner way, but this one works for me. Simply loop over the list of numbers until finding the "active" as you said and click the following.

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

BASE_URL = "https://dati.comune.milano.it/dataset?groups=heal&page=1" #first page

driver = webdriver.Chrome(
        executable_path=ChromeDriverManager().install()
    )
driver.get(BASE_URL)

url_list_xpath = "/html/body/div[2]/div/div[3]/div/section[1]/div[2]/ul" # this is the page bar at the bottom

to_click = False
last_page = driver.find_element_by_xpath("/html/body/div[2]/div/div[3]/div/section[1]/div[2]/ul/li[5]/a") \
    .get_attribute("href") # find last page
current_page = BASE_URL

# iterate over the urls and click the next url after the active one
while current_page!=last_page:
    ul = driver.find_element_by_xpath(url_list_xpath)
    for li in ul.find_elements_by_tag_name("li"):
        if to_click:
            break
        if li.get_attribute("class") == 'active':
            to_click = True
    to_click = False  
    current_page = li.find_elements_by_tag_name("a")[0].get_attribute("href")
    driver.get(current_page)

Navigate pagination with Selenium Webdriver

Answers (2)

Related Questions