Pravesh Jain
Pravesh Jain

Reputation: 4288

Cannot initiate Firefox from Scrapy script, but it runs fine from the command line

I am using Scrapy for my crawling needs. For dynamic webpages, I use Selenium to load the page in Firefox. Since the code is to be ran on an AWS instance, I am using PyVirtualDisplay to create a virtual display for Firefox. The whole process worked fine for months, till it stopped today without any changes to the code.

Now, when I run my crawler using the command scrapy crawl amazon, I get an error saying:

Message: The browser appears to have exited before we could connect. If you specified a log_file in the FirefoxBinary constructor, check it for details.

So I tried checking if it will work on shell. I trie the following:

scrapy shell <url>
>>> from selenium import webdriver
>>> from pyvirtualdisplay import Display
>>> display = Display(visible=0, size=(800, 600))
>>> display.start()
<Display cmd_param=['Xvfb', '-br', '-screen', '0', '800x600x24', ':106835'] cmd=['Xvfb', '-br', '-screen', '0', '800x600x24', ':106835'] oserror=None returncode=None stdout="None" stderr="None" timeout=False>
>>> browser = webdriver.Firefox()
>>> browser.get(response.url)

As you can see, the firefox window opened without any error here. I can even see firefox running as a process after this.

ps -ef | grep firefox

ubuntu 26377 24202 42 19:12 pts/1 00:00:01 /usr/lib/firefox/firefox -foreground

ubuntu 26435 31306 0 19:12 pts/0 00:00:00 grep --color=auto firefox

I can even find elements and do all my stuff through the shell. Why won't the same work through the script?

Upvotes: 1

Views: 527

Answers (2)

Pravesh Jain
Pravesh Jain

Reputation: 4288

So finally after a lot of experimentation, I've found something that works (not sure why though).

The above mentioned way works through Shell but not through a script. If I create the Webdriver object by specifying the Firefox Binary explicitly, it works. Below is the code for this:

from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
log_file = open('/home/ubuntu/log.txt','w')
binary = FirefoxBinary('/usr/bin/firefox', log_file=log_file)
browser = webdriver.Firefox(firefox_binary=binary)
browser.get(url)

This works perfectly fine. If someone can share their thoughts on why this works this way only, I would be grateful.

Upvotes: 1

Rahul
Rahul

Reputation: 3396

Your code does work for me. You can also try closing display and browser:

from selenium import webdriver
from pyvirtualdisplay import Display

display = Display(visible=0, size=(1024, 768))
display.start()
browser = webdriver.Firefox()
browser.get(response.url)
browser.close()
display.close()

Upvotes: 0

Related Questions