Reputation: 22983
I face a problem when using Scrapy + Mongodb with Tor. I get the following error when I try to have a mongodb pipeline in Scrapy.
2012-11-05 13:41:14-0500 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
|S-chain|-<>-127.0.0.1:9050-<><>-127.0.0.1:27017-<--denied
Traceback (most recent call last):
File "/usr/bin/scrapy", line 4, in <module>
execute()
File "/usr/lib/python2.7/dist-packages/scrapy/cmdline.py", line 131, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "/usr/lib/python2.7/dist-packages/scrapy/cmdline.py", line 97, in _run_print_help
func(*a, **kw)
File "/usr/lib/python2.7/dist-packages/scrapy/cmdline.py", line 138, in _run_command
cmd.run(args, opts)
File "/usr/lib/python2.7/dist-packages/scrapy/commands/crawl.py", line 42, in run
q = self.crawler.queue
File "/usr/lib/python2.7/dist-packages/scrapy/command.py", line 33, in crawler
self._crawler.configure()
File "/usr/lib/python2.7/dist-packages/scrapy/crawler.py", line 43, in configure
self.engine = ExecutionEngine(self.settings, self._spider_closed)
File "/usr/lib/python2.7/dist-packages/scrapy/core/engine.py", line 33, in __init__
self.scraper = Scraper(self, self.settings)
File "/usr/lib/python2.7/dist-packages/scrapy/core/scraper.py", line 66, in __init__
self.itemproc = itemproc_cls.from_settings(settings)
File "/usr/lib/python2.7/dist-packages/scrapy/middleware.py", line 33, in from_settings
mw = mwcls()
File "/home/bharani/ABCD_scraper/political_forum_scraper/pipelines.py", line 9, in __init__
settings['MONGODB_PORT'])
File "/usr/local/lib/python2.7/dist-packages/pymongo/connection.py", line 290, in __init__
self.__find_node()
File "/usr/local/lib/python2.7/dist-packages/pymongo/connection.py", line 586, in __find_node
raise AutoReconnect(', '.join(errors))
pymongo.errors.AutoReconnect: could not connect to localhost:27017: [Errno 111] Connection refused
I am not sure how to resolve this. When I do not use proxychains
, it crawls perfectly fine.
Any help is appreciated.
Thanks.
Edit:
It's not code specific. See this link: http://isbullsh.it/2012/04/Web-crawling-with-scrapy/
This is a simple tutorial to use Scrapy
with MongoDB
. We are supposed to call
scrapy crawl isbullshit
to run the crawler which works perfectly fine. To use Tor
, it should be called like this:
proxychains scrapy crawl isbullshit
Which does not work for me. The source code of the tutorial is here: https://github.com/BaltoRouberol/isbullshit-crawler
Upvotes: 0
Views: 2288
Reputation: 86
It might be that it's trying to redirect your MongoDB connection (localhost:27017) to TOR. If you want to exclude localhost connections from proxychains, you can add the following line to your /etc/proxychains.conf:
localnet 127.0.0.1 000 255.255.255.255
Upvotes: 1
Reputation:
pymongo.errors.AutoReconnect: could not connect to localhost:27017: [Errno 111] Connection refused
It seems you cannot connect to the localhost on port 27017. Is this the correct port and correct host? Make sure about that, also make sure mongodb server is running on the background otherwise you will never connect it.
If mongodb is running in the background, remove the mongodb.lock
rm -r/var/lib/mongodb
and restart the server, something like;
sudo service mongodb start
in Debian or
sudo systemctl restart mongodb
in Arch Linux
Upvotes: 2