Reputation: 3083
For some reason Scrapy will no longer run on my machine. I tried upgrading scrapy, uninstalling it, reinstalling it and no dice. Can anyone shed some light on this?
Here is the trace:
Slevins-iMac:goodstuff slevin$ scrapy crawl chees
2017-01-28 18:20:38 [scrapy.utils.log] INFO: Scrapy 1.3.0 started (bot: goodstuff)
2017-01-28 18:20:38 [scrapy.utils.log] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'goodstuff.spiders', 'SPIDER_MODULES': ['goodstuff.spiders'], 'USER_AGENT': 'GoodStuff (+http://www.goodstuff.com)', 'DOWNLOAD_DELAY': 0.25, 'BOT_NAME': 'goodstuff'}
2017-01-28 18:20:38 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
Unhandled error in Deferred:
2017-01-28 18:21:53 [twisted] CRITICAL: Unhandled error in Deferred:
2017-01-28 18:21:53 [twisted] CRITICAL:
Traceback (most recent call last):
File "/Users/slevin/Library/Python/2.7/lib/python/site-packages/twisted/internet/defer.py", line 1299, in _inlineCallbacks
result = g.send(result)
File "/Library/Python/2.7/site-packages/Scrapy-1.3.0-py2.7.egg/scrapy/crawler.py", line 90, in crawl
six.reraise(*exc_info)
File "/Library/Python/2.7/site-packages/Scrapy-1.3.0-py2.7.egg/scrapy/crawler.py", line 72, in crawl
self.engine = self._create_engine()
File "/Library/Python/2.7/site-packages/Scrapy-1.3.0-py2.7.egg/scrapy/crawler.py", line 97, in _create_engine
return ExecutionEngine(self, lambda _: self.stop())
File "/Library/Python/2.7/site-packages/Scrapy-1.3.0-py2.7.egg/scrapy/core/engine.py", line 69, in __init__
self.downloader = downloader_cls(crawler)
File "/Library/Python/2.7/site-packages/Scrapy-1.3.0-py2.7.egg/scrapy/core/downloader/__init__.py", line 88, in __init__
self.middleware = DownloaderMiddlewareManager.from_crawler(crawler)
File "/Library/Python/2.7/site-packages/Scrapy-1.3.0-py2.7.egg/scrapy/middleware.py", line 58, in from_crawler
return cls.from_settings(crawler.settings, crawler)
File "/Library/Python/2.7/site-packages/Scrapy-1.3.0-py2.7.egg/scrapy/middleware.py", line 40, in from_settings
mw = mwcls()
File "/Users/slevin/Documents/GoodStuff/Scrapers/goodstuff/goodstuff/middleware.py", line 7, in __init__
self.ua = UserAgent()
File "/Library/Python/2.7/site-packages/fake_useragent/fake.py", line 17, in __init__
self.load()
File "/Library/Python/2.7/site-packages/fake_useragent/fake.py", line 21, in load
self.data = load_cached()
File "/Library/Python/2.7/site-packages/fake_useragent/utils.py", line 138, in load_cached
update()
File "/Library/Python/2.7/site-packages/fake_useragent/utils.py", line 133, in update
write(load())
File "/Library/Python/2.7/site-packages/fake_useragent/utils.py", line 99, in load
browsers_dict[browser_key] = get_browser_versions(browser)
File "/Library/Python/2.7/site-packages/fake_useragent/utils.py", line 64, in get_browser_versions
html = get(settings.BROWSER_BASE_PAGE.format(browser=quote_plus(browser)))
File "/Library/Python/2.7/site-packages/fake_useragent/utils.py", line 29, in get
return urlopen(request, timeout=settings.HTTP_TIMEOUT).read()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 431, in open
response = self._open(req, data)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 449, in _open
'_open', req)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 409, in _call_chain
result = func(*args)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1227, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1197, in do_open
raise URLError(err)
URLError: <urlopen error timed out>
I have also tried upgrading Scrapy after install 1.3.0 but I get a permission denied error when pip tries to uninstall six-1.4.1.
Upvotes: 0
Views: 124
Reputation: 1887
This problem is unrelated to Scrapy and Twisted. As seen from your log, you use custom middleware based on https://github.com/hellysmile/fake-useragent which, in turn, connects to http://useragentstring.com/ to retrieve list of browser versions - and request to http://useragentstring.com/pages/useragentstring.php?name= leads to timeout error. For the moment of writing that page still can't be reached.
As for me it is real overhead to use such a library (that connects to third party server on every request). Consider to use some library that auto generates fake user agents in autonomously like https://pypi.python.org/pypi/user_agent
Upvotes: 1