Python selenium: running into StaleElementReferenceException

Question

I am trying to scrape all job postings for the last 24 hours from Glassdoor and save them to a dictionary.

binary = FirefoxBinary('path_to_firebox_binary.exe')
cap = DesiredCapabilities().FIREFOX
cap["marionette"] = True
driver = webdriver.Firefox(firefox_binary=binary, capabilities=cap, executable_path=GeckoDriverManager().install())

base_url = 'https://www.glassdoor.com/Job/jobs.htm?suggestCount=0&suggestChosen=false&clickSource=searchBtn' \
       '&typedKeyword=data+sc&sc.keyword=data+scientist&locT=C&locId=1154532&jobType= '
driver.get(url=base_url)
driver.implicitly_wait(20)
driver.maximize_window()
WebDriverWait(driver, 20).until(
        EC.element_to_be_clickable((By.CSS_SELECTOR, "div#filter_fromAge>span"))).click()
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((
        By.XPATH, "//div[@id='PrimaryDropdown']/ul//li//span[@class='label' and contains(., 'Last Day')]"))).click()

# find job listing elements on web page
listings = driver.find_elements_by_class_name("jl")
n_listings = len(listings)

results = {}

for index in range(n_listings):
    driver.find_elements_by_class_name("jl")[index].click()  # runs into error
    print("clicked listing {}".format(index + 1))
    info = driver.find_element_by_class_name("empInfo.newDetails")
    emp = info.find_element_by_class_name("employerName")

results[index] = {'title': title, 'company': emp_name, 'description': description}

I keep running into the error message

selenium.common.exceptions.StaleElementReferenceException: Message: The element reference of is stale; either the element is no longer attached to the DOM, it is not in the current frame context, or the document has been refreshed

for the first line inside my for loop. Even if the for loop runs for some number of times, it eventually leads to the exception showing up. I am new to selenium and web scraping, will appreciate any help.

Guy · Accepted Answer

Every time a new post is selected the clicked element is being modified, and therefor the DOM is being refreshed. The change is slow, certainly in comparison to the actions in the loop, so what you want to do is to slow it a little bit. Instead of using fixed sleep you can wait for the changes to occur

Every time you select a posting a new class selected is being added and the style attribute lose it's content. You should wait for this to happen, get the information, and click the next post

wait = WebDriverWait(driver, 20)
for index in range(n_listings - 1):
    wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, '.selected:not([style="border-bottom:0"])')))
    print("clicked listing {}".format(index + 1))
    info = driver.find_element_by_class_name('empInfo.newDetails')
    emp = info.find_element_by_class_name('employerName')
    if index < n_listings - 1:
        driver.find_element_by_css_selector('.selected + .jl').click()

Python selenium: running into StaleElementReferenceException

Answers (2)

Related Questions