oldboy
oldboy

Reputation: 5954

StaleElementReferenceException

I've read about the StaleElementReferenceException in the official documentation, but I still don't understand why my code is raising this exception? Does browser.get() instantiate a new spider?

class IndiegogoSpider(CrawlSpider):
    name = 'indiegogo'
    allowed_domains = [ 'indiegogo.com' ]
    start_urls = [ 'https://www.indiegogo.com/explore/all?project_type=all&project_timing=all&sort=trending' ]

    def parse(self, response):

        if (response.status != 404):
            options = Options()
            options.add_argument('-headless')
            browser = webdriver.Firefox(firefox_options=options)
            browser.get(self.start_urls[0])

            show_more = WebDriverWait(browser, 10).until(
                EC.element_to_be_clickable((By.XPATH, '//div[@class="text-center"]/a'))
            )

            while True:
                try:
                    show_more.click()
                except Exception:
                    break

            hrefs = WebDriverWait(browser, 60).until(
                EC.visibility_of_all_elements_located((By.XPATH, '//div[@class="discoverableCard"]/a'))
            )

            for href in hrefs:
                browser.get(href.get_attribute('href'))

                #
                # will be scraping individual pages here
                #


            browser.close()

I've tried the following to no avail. I've also tried placing the links variable elsewhere in the script, in a different scope, also to no avail.

            links = []

            for href in hrefs:
                links.append(href.get_attribute('href'))

            for link in links:
                browser.get(href.get_attribute('href'))

                #
                # will be scraping individual pages here
                #

Not sure why hrefs and especially links are erased from memory? When I extract the value of the href attribute of each item in the hrefs iterable, and then stick all of the URLs in the links variable, shouldn't the links list be independent of the DOM and page changes?

Not sure what to do at this point. Any ideas?

Upvotes: 1

Views: 279

Answers (2)

Andrei
Andrei

Reputation: 5647

As documentaion says:

A stale element reference exception is thrown in one of two cases, the first being more common than the second:

  • The element has been deleted entirely.
  • The element is no longer attached to the DOM.

In your case it is:

  • The element is no longer attached to the DOM.

It is because of browser.get(href.get_attribute('href')). When you are redirecting to the another page, your DOM will be completely reloaded and the hrefs does not reference to the elements from previous page. That's why you are getting an error.

How to deal with this error? You can do like this:

links = []

    for href in hrefs: # store all links as a strings
        links.append(href.get_attribute('href'))

    for link in links: # then just use them
        browser.get(link)

Upvotes: 2

tankthinks
tankthinks

Reputation: 1001

@Anthony, your second code block with links should work, it just looks like you have a copy/paste bug:

for link in links:
    browser.get(href.get_attribute('href'))

should be

for link in links:
    browser.get(link)
    ... 

Upvotes: 1

Related Questions