user1592380
user1592380

Reputation: 36307

Scrapy: SELENIUM_DRIVER_NAME and SELENIUM_DRIVER_EXECUTABLE_PATH must be set

enter image description here

I'm trying out a scrapy package https://github.com/clemfromspace/scrapy-selenium.

I've followed the directions on the main github page above. I started a new scrapy project and created a spider:

from scrapy_selenium import SeleniumRequest

from shutil import which

SELENIUM_DRIVER_NAME = 'firefox'

SELENIUM_DRIVER_EXECUTABLE_PATH = which('geckodriver')
SELENIUM_DRIVER_ARGUMENTS=['-headless']  # '--headless' if using chrome instead of firefox



class MySpider(scrapy.Spider):

    start_urls = ["http://yahoo.com"]
    name = 'test'


    def start_requests(self):
        for url in self.start_urls:

            yield SeleniumRequest(url, self.parse_index_page)


    def parse_index_page(self, response):
        ....

I've downloaded the latest geckodriver and set the path as above

The output contains:

 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.logstats.LogStats']
2019-07-05 14:14:44 [scrapy.middleware] WARNING: Disabled SeleniumMiddleware: SELENIUM_DRIVER_NAME and SELENIUM_DRIVER_EXECUTABLE_PATH must be set
2019-07-05 14:14:44 [scrapy.middleware] INFO: Enabled downloader middlewares:
2019-07-05 14:56:59 [scrapy.middleware] INFO: Enabled downloader middlewares:

['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats']

I don't see the selenium downloader and I see

WARNING: Disabled SeleniumMiddleware: SELENIUM_DRIVER_NAME and SELENIUM_DRIVER_EXECUTABLE_PATH must be set. 

What am I doing wrong?

EDIT:

I ENDED UP PUTTING:

# -*- coding: utf-8 -*-
import os
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities

SELENIUM_DRIVER_NAME = 'firefox'
SELENIUM_DRIVER_EXECUTABLE_PATH = 'E:/ENVS/r3/scrapySelenium/geckodriver.exe'
SELENIUM_DRIVER_ARGUMENTS=[]  # '--headless' if using chrome instead of firefox'

os.environ["PATH"] += os.pathsep + SELENIUM_DRIVER_EXECUTABLE_PATH
os.environ["PATH"] += os.pathsep + '..../AppData/Local/Mozilla Firefox'



firefox_capabilities = DesiredCapabilities.FIREFOX
firefox_capabilities['marionette'] = True
firefox_capabilities['binary'] = '..../AppData/Local/Mozilla Firefox/firefox.exe'

driver = webdriver.Firefox(capabilities=firefox_capabilities)

in settings.py ,Following a slew of error messages which eventually got it working

Upvotes: 1

Views: 4001

Answers (1)

Gallaecio
Gallaecio

Reputation: 3857

You must add those settings (SELENIUM_DRIVER_*) to your Scrapy settings, which are usually defined in a settings.py file.

Upvotes: 2

Related Questions