Reputation: 79
Good evening modern-day heroes, hope everyone's safe and sound !
What I'm hoping to achieve with this selenium script is to load up the page, click on BTC, ETH, XRP icons to filter results, then keep clicking the "show more" button until the max number of elements have been loaded --> 1138, then to obtain all the hrefs of those 1138 companies, click on each and visit their respective pages, then scrape further data points located on each internal page visited
With that said, I've tried lots of different approaches including just to print the link of each company which it worked, however, it fails to properly go/visit the extracted hrefs and says ("stale element reference: element is not attached to the page document").
Heard that explicit/implicit waits could help to fix this, but I can't seem to wrap my head around how to use it with the variable links particularly which is where the code stops to give me the error aforementioned
Have a feeling that the issue is with the while loop and how it processes the fact that I'm looping through a list of links that will be visited next. Can't emphasize how grateful I'll be if someone can guide me in the right direction !!
from selenium.webdriver import Chrome
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
import time
from selenium.common.exceptions import NoSuchElementException, ElementNotVisibleException
webdriver = '/Users/karimnabil/projects/selenium_js/chromedriver-1'
driver = Chrome(webdriver)
url = 'https://acceptedhere.io/catalog/company/'
driver.get(url)
btc = driver.find_element_by_xpath("//ul[@role='currency-list']/li[1]/a")
btc.click()
eth = driver.find_element_by_xpath("//ul[@role='currency-list']/li[2]/a")
eth.click()
xrp = driver.find_element_by_xpath("//ul[@role='currency-list']/li[5]/a")
xrp.click()
all_categories = driver.find_element_by_xpath("//div[@class='dropdownMenu']/ul/li[1]")
all_categories.click()
time.sleep(1)
maximun_number = 1138
while True:
show_more = driver.find_element_by_xpath("//div[@class='row search-result']/div[3]/button")
elements = driver.find_elements_by_xpath("//div[@class='row desktop-results mobile-hide']/div")
if len(elements) > maximun_number:
break
show_more.click()
time.sleep(1)
for element in elements:
links = element.find_elements_by_xpath(".//div/div/div[2]/div/div/div[1]/a")
links = [url.get_attribute('href') for url in links]
time.sleep(0.5)
for link in links:
driver.get(link)
company_title = driver.find_element_by_xpath("//h3").text
print(company_title)
Upvotes: 0
Views: 267
Reputation: 89
When you navigate through pages the elements you put in you variables (e.g. show_more ) becomes stale or stateless since you are on a different page. It may seem you need to wait for an element to load or to be clickable. Here are some examples:
Upvotes: 1