Reyraa
Reyraa

Reputation: 4274

Selenium runs Firefox driver for spiders not used in

I've used Firefox driver of Selenium to load and scrap web pages in some of spiders in my Scrapy project.

The problem:
Selenium runs an instance of Firefox when running all the spiders, event those I've not imported webdriver and not called webdriver.Firefox() in.

Expected behavior:
Selenium runs an instance of Firfox only when I run spiders that have been used webdriver.Firefox() in.

Why this is important?
I'm quiting the Firefox instance after the spider is done, but vividly this is not happening in spiders not using Selenium.

The spider that is not using Selenium
This spider is not using Selenium and I expect it not to run Firefox.

class MySpider(scrapy.Spider):
    name = "MySpider"
    domain = 'www.example.com'
    allowed_domains = ['http://example.com']
    start_urls = ['http://example.com']

    def parse(self, response):
        for sel in response.css('.main-content'):
            # Article is a scrapy.item
            item = Article()
            item['title'] = sel.css('h1::text').extract()[0]
            item['body'] = sel.css('p::text').extract()[0]
            yield item

Upvotes: 2

Views: 113

Answers (1)

Reyraa
Reyraa

Reputation: 4274

The issue was actually in how I was instantiating webdriver.Firefox module in spiders that were intended to use Selenium:

class MySpider(scrapy.Spider):
    # basic scrapy setting
    driver = webdriver.Firefox()

    def parse(self, response):
        self.driver.get(response.url)
        result = scrapy.Selector(text=self.driver.page_source)
        # scrap and yield items to pipeline
        # then in certain condition:
        self.driver.quit()

Why it was happening?
When running Scrapy commands, python interprets all the classes in project. so no matter which spider I was trying to run, Selenium ran a new instance of webdriver.Firefox for each spider class containing this command line.

Solution
Just moved webdriver instantiation to class init method:

def __init__(self):
    self.driver = webdriver.Firefox()

Upvotes: 2

Related Questions