densol96
densol96

Reputation: 85

Scrapy-selenium error: TypeError: WebDriver.__init__() got an unexpected keyword argument 'executable_path'

I am trying to set up scrapy-selenium to do some scraping: pip installed scrappy, scrapy-selenium; downloaded and put to my project directory chromedriver.exe, updated the setting.py:

from shutil import which
  
SELENIUM_DRIVER_NAME = 'chrome'
SELENIUM_DRIVER_EXECUTABLE_PATH = which('chromedriver')
SELENIUM_DRIVER_ARGUMENTS=['--headless']  
  
DOWNLOADER_MIDDLEWARES = {
     'scrapy_selenium.SeleniumMiddleware': 800
     }

Also tried to use a full path to the location of Chromedriver rather than just which function, but I am getting this error and I am not sure why:

2023-06-20 10:48:59 [twisted] CRITICAL: Unhandled error in Deferred:

Traceback (most recent call last):
  File "C:\Users\denis\Desktop\Scrapy_Study\pythonProject\venv\Lib\site-packages\scrapy\crawler.py", line 240, in crawl
    return self._crawl(crawler, *args, **kwargs)
  File "C:\Users\denis\Desktop\Scrapy_Study\pythonProject\venv\Lib\site-packages\scrapy\crawler.py", line 244, in _crawl
    d = crawler.crawl(*args, **kwargs)
  File "C:\Users\denis\Desktop\Scrapy_Study\pythonProject\venv\Lib\site-packages\twisted\internet\defer.py", line 1947, in unwindGenerator
    return _cancellableInlineCallbacks(gen)
  File "C:\Users\denis\Desktop\Scrapy_Study\pythonProject\venv\Lib\site-packages\twisted\internet\defer.py", line 1857, in _cancellableInlineCallbacks
    _inlineCallbacks(None, gen, status, _copy_context())
--- <exception caught here> ---
  File "C:\Users\denis\Desktop\Scrapy_Study\pythonProject\venv\Lib\site-packages\twisted\internet\defer.py", line 1697, in _inlineCallbacks
    result = context.run(gen.send, result)
  File "C:\Users\denis\Desktop\Scrapy_Study\pythonProject\venv\Lib\site-packages\scrapy\crawler.py", line 129, in crawl
    self.engine = self._create_engine()
  File "C:\Users\denis\Desktop\Scrapy_Study\pythonProject\venv\Lib\site-packages\scrapy\crawler.py", line 143, in _create_engine
    return ExecutionEngine(self, lambda _: self.stop())
  File "C:\Users\denis\Desktop\Scrapy_Study\pythonProject\venv\Lib\site-packages\scrapy\core\engine.py", line 100, in __init__
    self.downloader: Downloader = downloader_cls(crawler)
  File "C:\Users\denis\Desktop\Scrapy_Study\pythonProject\venv\Lib\site-packages\scrapy\core\downloader\__init__.py", line 97, in __init__
    DownloaderMiddlewareManager.from_crawler(crawler)
  File "C:\Users\denis\Desktop\Scrapy_Study\pythonProject\venv\Lib\site-packages\scrapy\middleware.py", line 68, in from_crawler
    return cls.from_settings(crawler.settings, crawler)
  File "C:\Users\denis\Desktop\Scrapy_Study\pythonProject\venv\Lib\site-packages\scrapy\middleware.py", line 44, in from_settings
    mw = create_instance(mwcls, settings, crawler)
  File "C:\Users\denis\Desktop\Scrapy_Study\pythonProject\venv\Lib\site-packages\scrapy\utils\misc.py", line 170, in create_instance
    instance = objcls.from_crawler(crawler, *args, **kwargs)
  File "C:\Users\denis\Desktop\Scrapy_Study\pythonProject\venv\Lib\site-packages\scrapy_selenium\middlewares.py", line 67, in from_crawler
    middleware = cls(
  File "C:\Users\denis\Desktop\Scrapy_Study\pythonProject\venv\Lib\site-packages\scrapy_selenium\middlewares.py", line 51, in __init__
    self.driver = driver_klass(**driver_kwargs)
builtins.TypeError: WebDriver.__init__() got an unexpected keyword argument 'executable_path'

2023-06-20 10:48:59 [twisted] CRITICAL: 
Traceback (most recent call last):
  File "C:\Users\denis\Desktop\Scrapy_Study\pythonProject\venv\Lib\site-packages\twisted\internet\defer.py", line 1697, in _inlineCallbacks
    result = context.run(gen.send, result)
  File "C:\Users\denis\Desktop\Scrapy_Study\pythonProject\venv\Lib\site-packages\scrapy\crawler.py", line 129, in crawl
    self.engine = self._create_engine()
  File "C:\Users\denis\Desktop\Scrapy_Study\pythonProject\venv\Lib\site-packages\scrapy\crawler.py", line 143, in _create_engine
    return ExecutionEngine(self, lambda _: self.stop())
  File "C:\Users\denis\Desktop\Scrapy_Study\pythonProject\venv\Lib\site-packages\scrapy\core\engine.py", line 100, in __init__
    self.downloader: Downloader = downloader_cls(crawler)
  File "C:\Users\denis\Desktop\Scrapy_Study\pythonProject\venv\Lib\site-packages\scrapy\core\downloader\__init__.py", line 97, in __init__
    DownloaderMiddlewareManager.from_crawler(crawler)
  File "C:\Users\denis\Desktop\Scrapy_Study\pythonProject\venv\Lib\site-packages\scrapy\middleware.py", line 68, in from_crawler
    return cls.from_settings(crawler.settings, crawler)
  File "C:\Users\denis\Desktop\Scrapy_Study\pythonProject\venv\Lib\site-packages\scrapy\middleware.py", line 44, in from_settings
    mw = create_instance(mwcls, settings, crawler)
  File "C:\Users\denis\Desktop\Scrapy_Study\pythonProject\venv\Lib\site-packages\scrapy\utils\misc.py", line 170, in create_instance
    instance = objcls.from_crawler(crawler, *args, **kwargs)
  File "C:\Users\denis\Desktop\Scrapy_Study\pythonProject\venv\Lib\site-packages\scrapy_selenium\middlewares.py", line 67, in from_crawler
    middleware = cls(
  File "C:\Users\denis\Desktop\Scrapy_Study\pythonProject\venv\Lib\site-packages\scrapy_selenium\middlewares.py", line 51, in __init__
    self.driver = driver_klass(**driver_kwargs)
TypeError: WebDriver.__init__() got an unexpected keyword argument 'executable_path'

Anyone can help to fix this?

Upvotes: 1

Views: 1556

Answers (2)

malmike21
malmike21

Reputation: 111

Selenium altered from using executable_path to service object when generating a web driver object. Those alterations are not included in the current version of the Scrapy-selenium package. To fix this, I'd advise:

  1. Forking the project on GitHub: https://github.com/clemfromspace/scrapy-selenium/fork

  2. In the scrapy_selenium/middlewares.py create a service object and pass that instead of executable_path when creating a web_driver object (similar to the changes in this PR: https://github.com/clemfromspace/scrapy-selenium/pull/135/files).

    if driver_executable_path is not None:
        service_module = import_module(f'{webdriver_base_path}.service')
        service_klass = getattr(service_module, 'Service')
        service_kwargs = {
            'executable_path': driver_executable_path,
        }
        service = service_klass(**service_kwargs)
        driver_kwargs = {
           'service': service,
           'options': driver_options
        }
        self.driver = driver_klass(**driver_kwargs)
    
  3. Run the unit tests using python -m unittest discover -p "test_*.py" to confirm everything still works as expected.

  4. Commit and push your changes

  5. Pip uninstall scrapy-selenium

  6. Pip install git+{https://your_repository}

NB: You utilize the same configurations in settings.py when setting up the package in your project

Upvotes: 1

Jerry
Jerry

Reputation: 41

I helped solve this in this github post: https://github.com/clemfromspace/scrapy-selenium/issues/128. Please note I'm using scrapy to create web scrapers and Selenium to interact with the website.

  • go to ton77v's commit 5c3fe7b and copy his code in middlewares.py
  • replace the middlewares.py code under the scrapy_selenium package on your local machine (for me, it was in C:/Users//AppData/Local/anaconda3/Lib/site-packages/scrapy_selenium/middlewares.py)
  • [optional]: I had to !pip install webdriver-manager as well for your scrapy spider, you need to modify the settings.py file (this is part of the configuration files that appear when you start a scrapy project like items.py, middlewares.py, pipelines.py, and settings.py). Add the following lines of code into the settings.py file
    • SELENIUM_DRIVER_NAME = 'chrome'
    • SELENIUM_DRIVER_EXECUTABLE_PATH = None #not actually necessary, will work even if you comment this line out
    • SELENIUM_DRIVER_ARGUMENTS=[] #put '--headless' in the brackets to prevent browser popup
  • then enter scrapy runspider <scraper_name>.py in your terminal and enjoy!

Quick explanation of what's happening:

  • you're getting scrapy to install the BrowserDriverManager and don't have to specify the BrowserDriverManager location anymore
  • the beauty is that after the first BrowserDriverManager installation, it remembers the installation location and uses the installed BrowserDriverManager for subsequent runs
  • You can adapt the scraper to open other browsers by modifying middlewares.py file (get ChatGPT to do it for you XD) and changing SELENIUM_DRIVER_NAME = (browser name)

Upvotes: 2

Related Questions