Reputation: 1696
I found that my problem is very similar to scrapyd deploy shows 0 spiders. I also tried the accepted answer several times, but it doesn't work for me, so I come for some help.
The project directory is timediff_crawler, and tree view of the directory is:
timediff_crawler/
├── scrapy.cfg
├── scrapyd-deploy
├── timediff_crawler
│ ├── __init__.py
│ ├── items.py
│ ├── pipelines.py
│ ├── settings.py
│ ├── spiders
│ │ ├── __init__.py
│ │ ├── prod
│ │ │ ├── __init__.py
│ │ │ ├── job
│ │ │ │ ├── __init__.py
│ │ │ │ ├── zhuopin.py
│ │ │ ├── rent
│ │ │ │ ├── australia_rent.py
│ │ │ │ ├── canada_rent.py
│ │ │ │ ├── germany_rent.py
│ │ │ │ ├── __init__.py
│ │ │ │ ├── korea_rent.py
│ │ │ │ ├── singapore_rent.py
...
(crawl_env)web@ha-2:/opt/crawler$ scrapyd
2015-11-11 15:00:37+0800 [-] Log opened.
2015-11-11 15:00:37+0800 [-] twistd 15.4.0 (/opt/crawler/crawl_env/bin/python 2.7.6) starting up.
2015-11-11 15:00:37+0800 [-] reactor class: twisted.internet.epollreactor.EPollReactor.
2015-11-11 15:00:37+0800 [-] Site starting on 6800
...
[settings]
default = timediff_crawler.settings
[deploy:ha2-crawl]
url = http://localhost:6800/
project = timediff_crawler
(crawl_env)web@ha-2:/opt/crawler/timediff_crawler$ ./scrapyd-deploy -l
ha2-crawl http://localhost:6800/
(crawl_env)web@ha-2:/opt/crawler/timediff_crawler$ ./scrapyd-deploy ha2-crawl -p timediff_crawler
Packing version 1447229952
Deploying to project "timediff_crawler" in http://localhost:6800/addversion.json
Server response (200):
{"status": "ok", "project": "timediff_crawler", "version": "1447229952", "spiders": 0, "node_name": "ha-2"}
The response shows that the number of spiders is 0, actually I have about 10 spiders.
I followed the advice in this post scrapyd deploy shows 0 spiders, delete all projects, versions, directories(include build/ eggs/ project.egg-info setup.py) and try to deploy again, but it doesn't work, the number of spiders is always 0.
I validate the egg file, the output shows it seems ok:
(crawl_env)web@ha-2:/opt/crawler/timediff_crawler/eggs/timediff_crawler$ unzip -t 1447229952.egg
Archive: 1447229952.egg
testing: timediff_crawler/pipelines.py OK
testing: timediff_crawler/__init__.py OK
testing: timediff_crawler/items.py OK
testing: timediff_crawler/spiders/prod/job/zhuopin.py OK
testing: timediff_crawler/spiders/prod/rent/singapore_rent.py OK
testing: timediff_crawler/spiders/prod/rent/australia_rent.py OK
...
So I don't know what's going wrong, please help and thanks in advance!
Upvotes: 0
Views: 1042
Reputation: 35
Development environment is different than production. Scrapyd can not find other classes. Try to move related classes or modules under spider folder. Then give correct references again.
For defining a problematic reference you can try to comment out the references one by one and try scrapy-deploy
├── Myapp
│ └── spider
│ ├── bbc_spider.py
│── Model.py
At the development environment this works but after deploy by scrapyd it may not. Because bbc_spider.py can not reach to model.py . If you will, move the model.py under to spider folder. I solved problem like that.
Upvotes: 0
Reputation: 1
This is a late reply to this thread but I believe I figured out why, on submitting a new project version to Scrapyd, that 0 spiders is reported back.
In my case my spiders have quite a few dependencies - Redis, Elasticsearch, certifi (for connecting to ES using SSL), etc. Running Scrapyd on a different machine (production system) you have to exactly replicate your Python dependencies. On a local machine, if you have the same Python virtualenv active you won't run into the issue.
I found that, using the standard scrapy tutorial, I could successfully add a new project version to Scrapyd. From there I started stripping down and commenting out lines of code and imports in my spiders until I was able to successfully add them to Scrapyd. When I could get scrapyd-deploy to report back 0 spiders by uncommenting a single import, it dawned on me that it was a dependency issue.
Hopefully this will help someone else having the same issue.
It would be very nice to have scrapyd report back in the deploy response if there were dependencies that failed.
Upvotes: 0
Reputation: 1696
Thanks to @LearnAWK's advice, the problem is caused by the following configuration in settings.py
:
LOG_STDOUT = True
Actually I don't konw why this configuration affects the result of scrapyd.
Upvotes: 4
Reputation: 549
The spiders should be in the folder of spiders
, not the subfolders.
The following is the directory tree within the base folder of my scrapy project. Is yours similar, which was cutoff in your question?
my_scrapy_project/
├── CAPjobs
│ └── spiders
│ └── oldspiders
├── build
│ ├── bdist.macosx-10.9-x86_64
│ └── lib
│ └── CAPjobs
│ └── spiders
├── db
├── dbs
├── eggs
│ └── CAPjobs
├── items
│ └── CAPjobs
│ ├── spider1
│ ├── spider2
│ └── spider3
├── logs
│ └── CAPjobs
│ ├── spider1
│ ├── spider2
│ └── spider3
└── project.egg-info
Upvotes: 0