StaleElementReferenceException

Question

I've read about the StaleElementReferenceException in the official documentation, but I still don't understand why my code is raising this exception? Does browser.get() instantiate a new spider?

class IndiegogoSpider(CrawlSpider):
    name = 'indiegogo'
    allowed_domains = [ 'indiegogo.com' ]
    start_urls = [ 'https://www.indiegogo.com/explore/all?project_type=all&project_timing=all&sort=trending' ]

    def parse(self, response):

        if (response.status != 404):
            options = Options()
            options.add_argument('-headless')
            browser = webdriver.Firefox(firefox_options=options)
            browser.get(self.start_urls[0])

            show_more = WebDriverWait(browser, 10).until(
                EC.element_to_be_clickable((By.XPATH, '//div[@class="text-center"]/a'))
            )

            while True:
                try:
                    show_more.click()
                except Exception:
                    break

            hrefs = WebDriverWait(browser, 60).until(
                EC.visibility_of_all_elements_located((By.XPATH, '//div[@class="discoverableCard"]/a'))
            )

            for href in hrefs:
                browser.get(href.get_attribute('href'))

                #
                # will be scraping individual pages here
                #


            browser.close()

I've tried the following to no avail. I've also tried placing the links variable elsewhere in the script, in a different scope, also to no avail.

            links = []

            for href in hrefs:
                links.append(href.get_attribute('href'))

            for link in links:
                browser.get(href.get_attribute('href'))

                #
                # will be scraping individual pages here
                #

Not sure why hrefs and especially links are erased from memory? When I extract the value of the href attribute of each item in the hrefs iterable, and then stick all of the URLs in the links variable, shouldn't the links list be independent of the DOM and page changes?

Not sure what to do at this point. Any ideas?

tankthinks · Accepted Answer

@Anthony, your second code block with links should work, it just looks like you have a copy/paste bug:

for link in links:
    browser.get(href.get_attribute('href'))

should be

for link in links:
    browser.get(link)
    ...

StaleElementReferenceException

Answers (2)

Related Questions