user11322408
user11322408

Reputation:

scrapy-selenium driver doesn't follow

from scrapy_selenium import SeleniumRequest
import scrapy
from selenium import webdriver
class testspider1(scrapy.Spider):
    driver=webdriver.Firefox(executable_path=r"C:\Users\test\Desktop\geckodriver")
    name = 'test5'
    start_urls=['http://httpbin.org/ip']
    def parse(self, response):
        print(response.body)
        url = "https://www.target.com/p/cesar-canine-cuisine-filet-mignon-flavor-wet-dog-food-3-5oz-tray/-/A-14903668"
        yield SeleniumRequest(url=url,callback=self.parse_result)
        

    def parse_result(self,response):        
        image = response.xpath('//*[@id="mainContainer"]/div/div/div[1]/div[1]/div[2]/div[1]/div/div/div/div/div/div/div/a/div/div/div/div/div/img/@src').extract_first()
        price = response.selector.xpath('//*[@id="mainContainer"]/div/div/div[1]/div[2]/div/div[1]/span/text()').extract_first()
        print(image)
        print("\n\n")
        print(price)

settings file:

from shutil import which
BOT_NAME = 'seleniumtest'
SPIDER_MODULES = ['seleniumtest.spiders']
NEWSPIDER_MODULE = 'seleniumtest.spiders'
SELENIUM_DRIVER_NAME = 'firefox'
SELENIUM_DRIVER_EXECUTABLE_PATH = which('geckodriver')
SELENIUM_BROWSER_EXECUTABLE_PATH = which(r"C:\Users\test\Desktop\geckodriver")
ROBOTSTXT_OBEY = True
    DOWNLOADER_MIDDLEWARES = {
   'scrapy_selenium.SeleniumMiddleware': 800
}

documentation on scrapy-selenium I have followed the instructions step by step, but it the driver does not follow any links.I believe both requests are handled by scrapy. I don't want to change __init__ because I want some requests to be handled with scrapy-selenium others by scrapy(alone).

I checked passing-selenium-driver-to-scrapy but it changes the entire init to make selenium as self.driver.

I want some requests to be handled by SeleniumRequest others by scrapy Request

Note: I have used this site as example site that uses java to display results, if handled by scrapy (alone) data hasn't been rendered yet so empty lists will be the result

Upvotes: 1

Views: 936

Answers (1)

alexandre campos
alexandre campos

Reputation: 11

I replaced firefox with chrome:

from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(ChromeDriverManager().install())

Upvotes: 1

Related Questions