Atika
Atika

Reputation: 59

Scrapy and Selenium error : Element not found in the cache - perhaps the page has changed since it was looked up Stacktrace

I want to extract data from Amazon.
This is my source code :

    from scrapy.contrib.spiders import CrawlSpider
    from scrapy import Selector
    from selenium import webdriver
    from selenium.webdriver.support.select import Select
    from time import sleep
    import selenium.webdriver.support.ui as ui
    from scrapy.xlib.pydispatch import dispatcher
    from scrapy.http import HtmlResponse, TextResponse
    from extraction.items import ProduitItem

    class RunnerSpider(CrawlSpider):
      name = 'products'
      allowed_domains = ['amazon.com']
      start_urls = ['http://www.amazon.com']

      def __init__(self):
         self.driver = webdriver.Firefox()

     def parse(self, response):
        items = []
        sel = Selector(response)
        self.driver.get(response.url)
        recherche = self.driver.find_element_by_xpath('//*[@id="twotabsearchtextbox"]')
        recherche.send_keys("A")
        recherche.submit()
        resultat = self.driver.find_element_by_xpath('//ul[@id="s-results-list-atf"]')
        resultas = resultat.find_elements_by_xpath('//li')
        for result in resultas:
          item = ProduitItem()
          lien = result.find_element_by_xpath('//div[@class="s-item-container"]/div/div/div[2]/div[1]/a')
          lien.click()
          #lien.implicitly_wait(2)
          res = self.driver.find_element_by_xpath('//h1[@id="aiv-content-title"]')
          item['TITRE'] = res.text
          item['IMAGE'] = lien.find_element_by_xpath('//div[@id="dv-dp-left-content"]/div[1]/div/div/img').get_attribute('src')
          items.append(item)

        self.driver.close()
        yield items

When I run my code I get this error :

Element not found in the cache - perhaps the page has changed since it was looked up Stacktrace:

Upvotes: 0

Views: 464

Answers (1)

GHajba
GHajba

Reputation: 3691

If you tell Selenium to click on a likn you are moved from the original page to the page behind the link.

In your case you have a result site with some URLs to products on Amazon then you click one of the links in this result list and are moved to the detail site. In this case the site changes and the rest of the elements you want to iterate over in your for loop is not there -- that's why you get the exception.

Why don't you use the search result site to extract the title and the image? Both are there you would only need to change the XPath expressions to get the right fields of your lien.

Update

To get the Title from the search result site extract the text in the h2 element of the a element you want to click.

To get the image you need to take the other div in the li element: where in your XPath you select div[2] you need to select div[1] to get the image.

If you open the search result site in the browser and look at the sources with developer tools you can see which XPath expression to use for the elements.

Upvotes: 1

Related Questions