Reputation: 4576
I am in the process of building my first project incorporating scrapy. Everything works well on my development server (windows), but have a few issues on heroku. I am using django-dynamic-scraper which handled allot of the integration work for me.
On windows i run the following commands in separate command prompts:
: scrapy server
: python manage.py celeryd -l info
: python manage.py celerybeat
On heroku I run the following:
: heroku bash >heroku run scrappy server (solves app not found issue)
: heroku run python manage.py celeryd -l info -B --settings=myapp.production
The actual dejango app has no errors or issues and i can access the admin website. scrappy server runs:
: Scrapyd web console available at http://0.0.0.0:6800/
: [Launcher] Scrapyd started: max_proc=16, runner='scrapyd.runner'
: Site starting on 6800
: Starting factory <twisted.web.server.Site instanceat 0x7f1511f62ab8>
and celery beat and worker are working:
: INFO/Beat] beat: Starting...
: INFO/Beat] Writing entries...
: INFO/MainProcess] Connected to django://guest:**@localhost:5672//
: WARNING/MainProcess] celery@081b4100-eb7f-441c-976d-ecf97d2d7e5a ready.
: INFO/Beat] Writing entries...
: INFO/Beat] Writing entries...
FIRST ISSUE: When the periodic task to run the spider is triggered i get the following error in the celery log.
File "/app/.heroku/python/lib/python2.7/site-packages/dynamic_scraper/utils/ta
sk_utils.py", line 31, in _pending_jobs
resp = urllib2.urlopen('http://localhost:6800/listjobs.json?project=default')
...
...
File "/app/.heroku/python/lib/python2.7/urllib2.py", line 1184, in do_open
raise URLError(err)
URLError: <urlopen error [Errno 111] Connection refused>
So it seems that for some reason heroku is not allowing celery to access the scrapy server.
Here are some of my settings:
scrapy.cfg
[settings]
default = myapp.scraper.scrape.settings
[deploy]
#url = http://localhost:6800/
project = myapp
celery config
[config]
app: default:0x7fd4983f6310 (djcelery.loaders.DjangoL
transport: django://guest:**@localhost:5672//
results: database
concurrency: 4 (prefork)
[queues]
celery exchange=celery(direct) key=celery
Thanks in advance and let me know if you need any more info.
Upvotes: 1
Views: 963
Reputation: 4576
The answer is: you can't run your web app, celery, and scrapy server on the same host and allow them to talk to each other. However, there are two ways to accomplish this setup with heroku.
Option 1:
localhost:6800
to myapp-scrapy.herokuapp.com
.Option 2:
I hope this helps somebody save some pain.
Upvotes: 1