Reputation: 73
I am scraping data from a webpage that is paginated, and once I finish scraping one page, I need to click the next button and continue scraping the next page. I then need to stop once I have scraped all of the pages and a next button no longer exists. Below contains the html around the "Next" button that I need to click.
<tr align="center">
<td colspan="8" bgcolor="#FFFFFF">
<br>
<span class="paging">
<b> -- Page 1 of 3 -- </b>
</span>
<p>
<span class="paging">
<a href="page=100155&by=state&state=AL&pagenum=2"> .
<b>Next -></b>
</a>
</span>
<span class="paging">
<a href=" page=100155&by=state&state=AL&pagenum=3">Last ->></a>
</span>
</p>
</td>
</tr>
I have tried selecting on class and on link text, and both have not worked for me in my current attempts.
2 examples of my code:
while True:
try:
link = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.LINK_TEXT, "Next ->"))).click()
except TimeoutException:
break
while True:
try:
link = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CLASS_NAME, "paging"))).click()
except TimeoutException:
break
All of the solutions I have found online have not worked, and have primarily ended with the following error:
ElementClickInterceptedException: Message: element click
intercepted: Element <a href="?
page=100155&by=state&state=AL&pagenum=2">...</a> is not
clickable at point (119, 840). Other element would receive the
click: <body class="custom-background hfeed" style="position:
relative; min-height: 100%; top: 0px;">...</body>
(Session info: chrome=76.0.3809.132)
If the remainder of the error code would be helpful to review, please let me know and I will update the post with this error.
I have looked at the following resources, all to no avail:
Python Selenium clicking next button until the end
python - How to click "next" in Selenium until it's no longer available?
Python Selenium Click Next Button
Python Selenium clicking next button until the end
Selenium clicking next button programmatically until the last page
How can I make Selenium click on the "Next" button until it is no longer possible?
Could anyone provide suggestions on how I can select the "Next" button (if it exists) and go to the next page with this set of HTML? Please let me know if you need any further clarification on the request.
Upvotes: 3
Views: 1702
Reputation: 113
We can approach this problem through the solution using two major libraries - selenium and requests.
We can check if the page we are on is the last page or not, and if it is not the last page, we can check for the next button (assuming the website follows the same html structure for paging in all pages)
stop = False
driver.get(url)
while not stop:
paging_elements = driver.find_elements_by_class_name("paging")
page_numbers = paging_elements[0].text.strip(" -- ").split("of")
## Getting the current page number and the final page number
final = int(page_numbers[1].strip())
current = int(page_numbers[0].split("Page")[-1].strip())
if current==final:
stop=True
else:
next_page_link = paging_elements[-2].find_element_by_name("a").get_attribute('href')
driver.get(next_page_link)
time.sleep(5) # This gap can be changed as per the load time of the page
import requests
r = requests.get(url)
stop = False
while not stop:
soup = BeautifulSoup(r.text, 'html.parser')
paging_elements = soup.find_all('span', attrs={'class': "paging"})
page_numbers = paging_elements[0].text.strip(" -- ").split("of")
## Getting the current page number and the final page number
final = int(page_numbers[1].strip())
current = int(page_numbers[0].split("Page")[-1].strip())
if current==final:
stop=True
else:
next_page_link = paging_elements[-2].find("a").get('href')
r = request.get(next_page_link)
One method is using the URL of the website itself instead of the button-clicking process as the button click is intercepted in this case.
Most web pages have a page attribute added to their URL (visible for pages >=2). So, a paginated website might have URLs such as:
www.targetwebsite.com/category?page_num=1
www.targetwebsite.com/category?page_num=2
www.targetwebsite.com/category?page_num=3
and so on.
In such cases, one can simply iterate over the page numbers until the final page number (as originally out in the proposed answer). This approach eliminates the breakage possibility of the target website changing CSS layout/style.
Furthermore, there might be a requirement to create the next_page_link by appending the base URL as done for next_url in the other question (line 40-41):
next_url = next_link.find("a").get("href")
r = session.get("https://reverb.com/marketplace" + next_url)
I hope this helps!
Upvotes: 2
Reputation: 5909
It sounds like you're asking two different questions here:
Here's a solution to #2 -- Javascript clicking:
public static void ExecuteJavaScriptClickButton(this IWebDriver driver, IWebElement element)
{
((IJavaScriptExecutor) driver).ExecuteScript("arguments[0].click();", element);
}
In the above code, you have to cast your WebDriver
instance as IJavascriptExecutor
, which allows you to run JS code through Selenium. The parameter element
is the element you wish to click -- in this case, the Next button.
Based on your code sample, your Javascript click may look something like this:
var nextButton = driver.findElement(By.LINK_TEXT, "Next ->"));
driver.ExecuteJavascriptClickButton(nextButton);
Now, moving onto your other issue -- clicking until the button is no longer visible. I would implement this in a while
loop that breaks whenever the Next button no longer exists. I also recommend implementing a function that can check the presence of the Next button, and ignore the ElementNotFound
or NoSuchElement
exception in case the button does not exist, to avoid breaking your test. Here's a sample that includes an ElementExists
implementation:
public bool ElementExists(this IWebDriver driver, By by)
{
// attempt to find the element -- return true if we find it
try
{
return driver.findElements(by).Count > 0;
}
// catch exception where we did not find the element -- return false
catch (Exception e)
{
return false;
}
}
public void ClickNextUntilInvisible()
{
while (driver.ElementExists(By.LINK_TEXT, "Next ->"))
{
// find next button inside while loop so it does not go stale
var nextButton = driver.findElement(By.LINK_TEXT, "Next ->"));
// click next button using javascript
driver.ExecuteJavascriptClickButton(nextButton);
}
}
This while
loop checks for the presence of the Next button with each iteration. If the button does not exist, the loop breaks. Inside the loop, we call driver.findElement
with each successive click, so that we do not get a StaleElementReferenceException
.
Hope this helps.
Upvotes: 0