user3691767
user3691767

Reputation: 125

Selenium Python - Access next pages of search results

I have to click on each search result one by one from this url:

Search Guidelines

I first extract the total number of results from the displayed text so that I can set the upper limit for iteration

upperlimit=driver.find_element_by_id("total_results")
number = int(upperlimit.text.split(' ')[0])

The loop is then defiend as for i in range(1,number):

However, after going through the first 10 results on the first page, list index goes out of range (probably because there are no more links to click). I need to click on "Next" to get the next 10 results, and so on till I'm done with all search results. How can I go around doing that?

Any help would be appreciated!

Upvotes: 2

Views: 8709

Answers (2)

alecxe
alecxe

Reputation: 474003

The problem is that the value of element with id total_results changes after the page is loaded, at first it contains 117, then changes to 44.

Instead, here is a more robust approach. It processes page by page until there is no more pages left:

from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException

driver = webdriver.Firefox()
url = 'http://www.nice.org.uk/Search.do?searchText=bevacizumab&newsearch=true#/search/?searchText=bevacizumab&mode=&staticTitle=false&SEARCHTYPE_all2=true&SEARCHTYPE_all1=&SEARCHTYPE=GUIDANCE&TOPICLVL0_all2=true&TOPICLVL0_all1=&HIDEFILTER=TOPICLVL1&HIDEFILTER=TOPICLVL2&TREATMENTS_all2=true&TREATMENTS_all1=&GUIDANCETYPE_all2=true&GUIDANCETYPE_all1=&STATUS_all2=true&STATUS_all1=&HIDEFILTER=EGAPREFERENCE&HIDEFILTER=TOPICLVL3&DATEFILTER_ALL=ALL&DATEFILTER_PREV=ALL&custom_date_from=&custom_date_to=11-06-2014&PAGINATIONURL=%2FSearch.do%3FsearchText%40%40bevacizumab%26newsearch%40%40true%26page%40%40&SORTORDER=BESTMATCH'
driver.get(url)

page_number = 1
while True:
    try:
        link = driver.find_element_by_link_text(str(page_number))
    except NoSuchElementException:
        break
    link.click()
    print driver.current_url
    page_number += 1

Basically, the idea here is to get the next page link, until there is no such ( NoSuchElementException would be thrown). Note that it would work for any number of pages and results.

It prints:

http://www.nice.org.uk/Search.do?searchText=bevacizumab&newsearch=true&page=1
http://www.nice.org.uk/Search.do?searchText=bevacizumab&newsearch=true&page=2#showfilter
http://www.nice.org.uk/Search.do?searchText=bevacizumab&newsearch=true&page=3#showfilter
http://www.nice.org.uk/Search.do?searchText=bevacizumab&newsearch=true&page=4#showfilter
http://www.nice.org.uk/Search.do?searchText=bevacizumab&newsearch=true&page=5#showfilter

Upvotes: 2

PepperoniPizza
PepperoniPizza

Reputation: 9112

There is not even the need to programatically press on the Next button, if you see carrefully, the url just needs a new parameter when browsing other result pages:

url = "http://www.nice.org.uk/Search.do?searchText=bevacizumab&newsearch=true&page={}#showfilter"

for i in range(1,5):
    driver.get(url.format(i))

    upperlimit=driver.find_element_by_id("total_results")
    number = int(upperlimit.text.split(' ')[0])

if you still want to programatically press on the next button you could use:

driver.find_element_by_class_name('next').click()

But I haven't tested that.

Upvotes: 0

Related Questions