Tendekai Muchenje
Tendekai Muchenje

Reputation: 563

Navigating pagination with selenium

I am getting stuck on a weird case of pagination. I am scraping search results from https://cotthosting.com/NYRocklandExternal/LandRecords/protected/SrchQuickName.aspx

I have search results that fall into 4 categories.

1) There are no search results

2) There is one results page

3) There is more than one results page but less than 12 results pages

4) There are more than 12 results pages.

For case 1, that is easy, I am just passing.

results = driver.find_element_by_class_name('GridView')
if len(results)== 0:
    pass

For cases 2 and 3, I am checking if the list of links in the containing element is at least one and then click it.

else:
    results_table = bsObj.find('table', {'class':'GridView'})
    sub_tables = results_table.find_all('table')
    next_page_links = sub_tables[1].find_all('a')
    if len(next_page_links) == 0
        scrapeResults()
    else:
        scrapeResults()
        ####GO TO NEXT PAGE UNTIL THERE IS NO NEXT PAGE

Question for case 2 and 3: What could i possibly check for here as my control?

The links are hrefs to pages 2, 3, etc. But the tricky part is if I am on a current page, say page 1, how do I make sure I a going to page 2 and when I am on page 2 how do i make sure I am going to page 3? The html for page 1 for the results list is as follows

<table cellspacing="0" cellpadding="0" border="0" style="border-collapse:collapse;">
   <tr>
      <td>Page: <span>1</span></td>
      <td><a href="javascript:__doPostBack(&#39;ctl00$cphMain$lrrgResults$cgvNamesDir&#39;,&#39;Page$2&#39;)">2</a></td>
      <td><a href="javascript:__doPostBack(&#39;ctl00$cphMain$lrrgResults$cgvNamesDir&#39;,&#39;Page$3&#39;)">3</a></td>
   </tr>
</table>

I can zero into this table specifically using sub_tables[1] see above bs4 code in case 2.

The problem is there is no next button that I could utilize. Nothing changes along the results pages in the html. There is nothing to isolate the current page besides the number in the span right before the links. And I would like it to stop when it reaches the last page

For case 4, the html looks like this:

<table cellspacing="0" cellpadding="0" border="0" style="border-collapse:collapse;">
   <tr>
      <td>Page: <span>1</span></td>
      <td><a href="javascript:__doPostBack(&#39;ctl00$cphMain$lrrgResults$cgvNamesDir&#39;,&#39;Page$2&#39;)">2</a></td>
      <td><a href="javascript:__doPostBack(&#39;ctl00$cphMain$lrrgResults$cgvNamesDir&#39;,&#39;Page$3&#39;)">3</a></td>
      <td><a href="javascript:__doPostBack(&#39;ctl00$cphMain$lrrgResults$cgvNamesDir&#39;,&#39;Page$4&#39;)">4</a></td>
      <td><a href="javascript:__doPostBack(&#39;ctl00$cphMain$lrrgResults$cgvNamesDir&#39;,&#39;Page$5&#39;)">5</a></td>
      <td><a href="javascript:__doPostBack(&#39;ctl00$cphMain$lrrgResults$cgvNamesDir&#39;,&#39;Page$6&#39;)">6</a></td>
      <td><a href="javascript:__doPostBack(&#39;ctl00$cphMain$lrrgResults$cgvNamesDir&#39;,&#39;Page$7&#39;)">7</a></td>
      <td><a href="javascript:__doPostBack(&#39;ctl00$cphMain$lrrgResults$cgvNamesDir&#39;,&#39;Page$8&#39;)">8</a></td>
      <td><a href="javascript:__doPostBack(&#39;ctl00$cphMain$lrrgResults$cgvNamesDir&#39;,&#39;Page$9&#39;)">9</a></td>
      <td><a href="javascript:__doPostBack(&#39;ctl00$cphMain$lrrgResults$cgvNamesDir&#39;,&#39;Page$10&#39;)">10</a></td>
      <td><a href="javascript:__doPostBack(&#39;ctl00$cphMain$lrrgResults$cgvNamesDir&#39;,&#39;Page$11&#39;)">...</a></td>
      <td><a href="javascript:__doPostBack(&#39;ctl00$cphMain$lrrgResults$cgvNamesDir&#39;,&#39;Page$Last&#39;)">Last</a></td>
   </tr>
</table>

The last two links are ... to show that there are more results pages and Last to signify the last page. However, the `Last link exists on every page and it is only on the last page itself that it is not an active link.

Question for case 4, how could i check if the last link is clickable and use this as my stopping point?

Bigger question for case 4, how do i manouver the ... to go through other results pages? The results page list is 12 values at most. i.e. the nearest ten pages to the current page, the ... link to more pages and the Last link. So i don't know what to do if my results have say 88 pages.

I am link a dump to a full sample page : https://ghostbin.com/paste/nrb27

Upvotes: 1

Views: 4668

Answers (4)

Mat
Mat

Reputation: 188

It simply worked for me.

driver.find_element_by_link_text("3").click()
driver.find_element_by_link_text("4").click()
....
driver.find_element_by_link_text("Last").click()  

Upvotes: 1

lauda
lauda

Reputation: 4173

What you should do is to count the number of results in a page and use the value from total results to estimate the total number of pages by dividing.

If you will inspect the page you will see: `

Displaying records 1 - 500 of 32563 at 10:08 AM ET on 9/16/2016

Knowing the total number of the page, start navigation and check that page is loaded if needed and knowing the current page you could get a dynamic selector for the page navigation number based on the page for 2 cases:

  • if pagination number is not a link then you are on that page
  • if pagination number is a link you can use it to click

You should't need 4 categories since: - you can count the number of results and how many can be displayed on a page - know the number of pages

  1. Create a method to navigate if needed with a for or other control structure
  2. For each navigation do what you need to do

Or go to the last page and start backwards until page 1 is not a link.

Upvotes: 1

parik
parik

Reputation: 2415

Click on the "last page" for get his numbers, and then click in each child.

Upvotes: 0

kotoj
kotoj

Reputation: 779

First of all you have to know what page are you at. To achieve it:

Find element with current page number, using xpath:

currentPageElement = driver.find_element(By.XPATH, '//table[./tbody/tr/td[text()='Page: ']]//span')

Then extract the number:

currentPageNumber = int(currentPageElement.text)

And then you can do anything: go to next page just adding 1 to current page number, go to last page and read the number, etc

Upvotes: 1

Related Questions