Vishesh Shrivastav
Vishesh Shrivastav

Reputation: 2139

Python selenium: running into StaleElementReferenceException

I am trying to scrape all job postings for the last 24 hours from Glassdoor and save them to a dictionary.

binary = FirefoxBinary('path_to_firebox_binary.exe')
cap = DesiredCapabilities().FIREFOX
cap["marionette"] = True
driver = webdriver.Firefox(firefox_binary=binary, capabilities=cap, executable_path=GeckoDriverManager().install())

base_url = 'https://www.glassdoor.com/Job/jobs.htm?suggestCount=0&suggestChosen=false&clickSource=searchBtn' \
       '&typedKeyword=data+sc&sc.keyword=data+scientist&locT=C&locId=1154532&jobType= '
driver.get(url=base_url)
driver.implicitly_wait(20)
driver.maximize_window()
WebDriverWait(driver, 20).until(
        EC.element_to_be_clickable((By.CSS_SELECTOR, "div#filter_fromAge>span"))).click()
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((
        By.XPATH, "//div[@id='PrimaryDropdown']/ul//li//span[@class='label' and contains(., 'Last Day')]"))).click()

# find job listing elements on web page
listings = driver.find_elements_by_class_name("jl")
n_listings = len(listings)

results = {}

for index in range(n_listings):
    driver.find_elements_by_class_name("jl")[index].click()  # runs into error
    print("clicked listing {}".format(index + 1))
    info = driver.find_element_by_class_name("empInfo.newDetails")
    emp = info.find_element_by_class_name("employerName")

results[index] = {'title': title, 'company': emp_name, 'description': description}

I keep running into the error message

selenium.common.exceptions.StaleElementReferenceException: Message: The element reference of is stale; either the element is no longer attached to the DOM, it is not in the current frame context, or the document has been refreshed

for the first line inside my for loop. Even if the for loop runs for some number of times, it eventually leads to the exception showing up. I am new to selenium and web scraping, will appreciate any help.

Upvotes: 2

Views: 198

Answers (2)

Guy
Guy

Reputation: 50949

Every time a new post is selected the clicked element is being modified, and therefor the DOM is being refreshed. The change is slow, certainly in comparison to the actions in the loop, so what you want to do is to slow it a little bit. Instead of using fixed sleep you can wait for the changes to occur

Every time you select a posting a new class selected is being added and the style attribute lose it's content. You should wait for this to happen, get the information, and click the next post

wait = WebDriverWait(driver, 20)
for index in range(n_listings - 1):
    wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, '.selected:not([style="border-bottom:0"])')))
    print("clicked listing {}".format(index + 1))
    info = driver.find_element_by_class_name('empInfo.newDetails')
    emp = info.find_element_by_class_name('employerName')
    if index < n_listings - 1:
        driver.find_element_by_css_selector('.selected + .jl').click()

Upvotes: 2

Kasra Sh
Kasra Sh

Reputation: 21

This error means the element you are trying to click on was not found, you have to first make sure the target element exists and then call click() or wrap it in a try/except block.

# ...
results = {}

for index in range(n_listings):
    try:
        driver.find_elements_by_class_name("jl")[index].click()  # runs into error
    except:
        print('Listing not found, retrying in 1 seconds ...')
        time.sleep(1)
        continue
    print("clicked listing {}".format(index + 1))
    info = driver.find_element_by_class_name("empInfo.newDetails")
    emp = info.find_element_by_class_name("employerName")
# ...

Upvotes: 1

Related Questions