Reputation: 563
I am getting stuck on a weird case of pagination. I am scraping search results from https://cotthosting.com/NYRocklandExternal/LandRecords/protected/SrchQuickName.aspx
I have search results that fall into 4 categories.
1) There are no search results
2) There is one results page
3) There is more than one results page but less than 12 results pages
4) There are more than 12 results pages.
For case 1, that is easy, I am just passing.
results = driver.find_element_by_class_name('GridView')
if len(results)== 0:
pass
For cases 2 and 3, I am checking if the list of links in the containing element is at least one and then click it.
else:
results_table = bsObj.find('table', {'class':'GridView'})
sub_tables = results_table.find_all('table')
next_page_links = sub_tables[1].find_all('a')
if len(next_page_links) == 0
scrapeResults()
else:
scrapeResults()
####GO TO NEXT PAGE UNTIL THERE IS NO NEXT PAGE
Question for case 2 and 3: What could i possibly check for here as my control?
The links are hrefs to pages 2, 3, etc. But the tricky part is if I am on a current page, say page 1, how do I make sure I a going to page 2 and when I am on page 2 how do i make sure I am going to page 3? The html for page 1 for the results list is as follows
<table cellspacing="0" cellpadding="0" border="0" style="border-collapse:collapse;">
<tr>
<td>Page: <span>1</span></td>
<td><a href="javascript:__doPostBack('ctl00$cphMain$lrrgResults$cgvNamesDir','Page$2')">2</a></td>
<td><a href="javascript:__doPostBack('ctl00$cphMain$lrrgResults$cgvNamesDir','Page$3')">3</a></td>
</tr>
</table>
I can zero into this table specifically using sub_tables[1]
see above bs4 code in case 2.
The problem is there is no next button that I could utilize. Nothing changes along the results pages in the html. There is nothing to isolate the current page besides the number in the span
right before the links. And I would like it to stop when it reaches the last page
For case 4, the html looks like this:
<table cellspacing="0" cellpadding="0" border="0" style="border-collapse:collapse;">
<tr>
<td>Page: <span>1</span></td>
<td><a href="javascript:__doPostBack('ctl00$cphMain$lrrgResults$cgvNamesDir','Page$2')">2</a></td>
<td><a href="javascript:__doPostBack('ctl00$cphMain$lrrgResults$cgvNamesDir','Page$3')">3</a></td>
<td><a href="javascript:__doPostBack('ctl00$cphMain$lrrgResults$cgvNamesDir','Page$4')">4</a></td>
<td><a href="javascript:__doPostBack('ctl00$cphMain$lrrgResults$cgvNamesDir','Page$5')">5</a></td>
<td><a href="javascript:__doPostBack('ctl00$cphMain$lrrgResults$cgvNamesDir','Page$6')">6</a></td>
<td><a href="javascript:__doPostBack('ctl00$cphMain$lrrgResults$cgvNamesDir','Page$7')">7</a></td>
<td><a href="javascript:__doPostBack('ctl00$cphMain$lrrgResults$cgvNamesDir','Page$8')">8</a></td>
<td><a href="javascript:__doPostBack('ctl00$cphMain$lrrgResults$cgvNamesDir','Page$9')">9</a></td>
<td><a href="javascript:__doPostBack('ctl00$cphMain$lrrgResults$cgvNamesDir','Page$10')">10</a></td>
<td><a href="javascript:__doPostBack('ctl00$cphMain$lrrgResults$cgvNamesDir','Page$11')">...</a></td>
<td><a href="javascript:__doPostBack('ctl00$cphMain$lrrgResults$cgvNamesDir','Page$Last')">Last</a></td>
</tr>
</table>
The last two links are ...
to show that there are more results pages and Last
to signify the last page. However, the `Last link exists on every page and it is only on the last page itself that it is not an active link.
Question for case 4, how could i check if the last
link is clickable and use this as my stopping point?
Bigger question for case 4, how do i manouver the ...
to go through other results pages? The results page list is 12 values at most. i.e. the nearest ten pages to the current page, the ...
link to more pages and the Last
link. So i don't know what to do if my results have say 88 pages.
I am link a dump to a full sample page : https://ghostbin.com/paste/nrb27
Upvotes: 1
Views: 4668
Reputation: 188
It simply worked for me.
driver.find_element_by_link_text("3").click()
driver.find_element_by_link_text("4").click()
....
driver.find_element_by_link_text("Last").click()
Upvotes: 1
Reputation: 4173
What you should do is to count the number of results in a page and use the value from total results to estimate the total number of pages by dividing.
If you will inspect the page you will see: `
Displaying records 1 - 500 of 32563 at 10:08 AM ET on 9/16/2016
Knowing the total number of the page, start navigation and check that page is loaded if needed and knowing the current page you could get a dynamic selector for the page navigation number based on the page for 2 cases:
You should't need 4 categories since: - you can count the number of results and how many can be displayed on a page - know the number of pages
Or go to the last page and start backwards until page 1 is not a link.
Upvotes: 1
Reputation: 2415
Click on the "last page" for get his numbers, and then click in each child.
Upvotes: 0
Reputation: 779
First of all you have to know what page are you at. To achieve it:
Find element with current page number, using xpath:
currentPageElement = driver.find_element(By.XPATH, '//table[./tbody/tr/td[text()='Page: ']]//span')
Then extract the number:
currentPageNumber = int(currentPageElement.text)
And then you can do anything: go to next page just adding 1 to current page number, go to last page and read the number, etc
Upvotes: 1