Daniel
Daniel

Reputation: 1696

scrapyd deploy shows 0 spiders by scrapyd-client

I found that my problem is very similar to scrapyd deploy shows 0 spiders. I also tried the accepted answer several times, but it doesn't work for me, so I come for some help.

The project directory is timediff_crawler, and tree view of the directory is:

timediff_crawler/
├── scrapy.cfg
├── scrapyd-deploy
├── timediff_crawler
│   ├── __init__.py
│   ├── items.py
│   ├── pipelines.py
│   ├── settings.py
│   ├── spiders
│   │   ├── __init__.py
│   │   ├── prod
│   │   │   ├── __init__.py
│   │   │   ├── job
│   │   │   │   ├── __init__.py
│   │   │   │   ├── zhuopin.py
│   │   │   ├── rent
│   │   │   │   ├── australia_rent.py
│   │   │   │   ├── canada_rent.py
│   │   │   │   ├── germany_rent.py
│   │   │   │   ├── __init__.py
│   │   │   │   ├── korea_rent.py
│   │   │   │   ├── singapore_rent.py
...

1.1 start scrapyd, it's ok

(crawl_env)web@ha-2:/opt/crawler$ scrapyd
2015-11-11 15:00:37+0800 [-] Log opened.
2015-11-11 15:00:37+0800 [-] twistd 15.4.0 (/opt/crawler/crawl_env/bin/python 2.7.6) starting up.
2015-11-11 15:00:37+0800 [-] reactor class: twisted.internet.epollreactor.EPollReactor.
2015-11-11 15:00:37+0800 [-] Site starting on 6800
...

1.2 edit scrapy.cfg

[settings]
default = timediff_crawler.settings

[deploy:ha2-crawl]
url = http://localhost:6800/
project = timediff_crawler

1.3 deploy the project

(crawl_env)web@ha-2:/opt/crawler/timediff_crawler$ ./scrapyd-deploy -l
ha2-crawl            http://localhost:6800/

(crawl_env)web@ha-2:/opt/crawler/timediff_crawler$ ./scrapyd-deploy ha2-crawl -p timediff_crawler
Packing version 1447229952
Deploying to project "timediff_crawler" in http://localhost:6800/addversion.json
Server response (200):
{"status": "ok", "project": "timediff_crawler", "version": "1447229952", "spiders": 0, "node_name": "ha-2"}

1.4 the problem

The response shows that the number of spiders is 0, actually I have about 10 spiders.

I followed the advice in this post scrapyd deploy shows 0 spiders, delete all projects, versions, directories(include build/ eggs/ project.egg-info setup.py) and try to deploy again, but it doesn't work, the number of spiders is always 0.

I validate the egg file, the output shows it seems ok:

(crawl_env)web@ha-2:/opt/crawler/timediff_crawler/eggs/timediff_crawler$ unzip -t 1447229952.egg 
Archive:  1447229952.egg

testing: timediff_crawler/pipelines.py   OK
testing: timediff_crawler/__init__.py   OK
testing: timediff_crawler/items.py   OK
testing: timediff_crawler/spiders/prod/job/zhuopin.py   OK
testing: timediff_crawler/spiders/prod/rent/singapore_rent.py   OK
testing: timediff_crawler/spiders/prod/rent/australia_rent.py   OK
...

So I don't know what's going wrong, please help and thanks in advance!

Upvotes: 0

Views: 1042

Answers (4)

MrCode
MrCode

Reputation: 35

Development environment is different than production. Scrapyd can not find other classes. Try to move related classes or modules under spider folder. Then give correct references again.

For defining a problematic reference you can try to comment out the references one by one and try scrapy-deploy

├── Myapp
│ └── spider
│ ├── bbc_spider.py

│── Model.py

At the development environment this works but after deploy by scrapyd it may not. Because bbc_spider.py can not reach to model.py . If you will, move the model.py under to spider folder. I solved problem like that.

Upvotes: 0

colorado5280
colorado5280

Reputation: 1

This is a late reply to this thread but I believe I figured out why, on submitting a new project version to Scrapyd, that 0 spiders is reported back.

In my case my spiders have quite a few dependencies - Redis, Elasticsearch, certifi (for connecting to ES using SSL), etc. Running Scrapyd on a different machine (production system) you have to exactly replicate your Python dependencies. On a local machine, if you have the same Python virtualenv active you won't run into the issue.

I found that, using the standard scrapy tutorial, I could successfully add a new project version to Scrapyd. From there I started stripping down and commenting out lines of code and imports in my spiders until I was able to successfully add them to Scrapyd. When I could get scrapyd-deploy to report back 0 spiders by uncommenting a single import, it dawned on me that it was a dependency issue.

Hopefully this will help someone else having the same issue.

It would be very nice to have scrapyd report back in the deploy response if there were dependencies that failed.

Upvotes: 0

Daniel
Daniel

Reputation: 1696

Thanks to @LearnAWK's advice, the problem is caused by the following configuration in settings.py:

LOG_STDOUT = True

Actually I don't konw why this configuration affects the result of scrapyd.

Upvotes: 4

LearnAWK
LearnAWK

Reputation: 549

The spiders should be in the folder of spiders, not the subfolders.

The following is the directory tree within the base folder of my scrapy project. Is yours similar, which was cutoff in your question?

my_scrapy_project/   
├── CAPjobs    
│   └── spiders    
│       └── oldspiders    
├── build    
│   ├── bdist.macosx-10.9-x86_64    
│   └── lib    
│       └── CAPjobs    
│           └── spiders    
├── db    
├── dbs    
├── eggs    
│   └── CAPjobs    
├── items    
│   └── CAPjobs    
│       ├── spider1    
│       ├── spider2    
│       └── spider3    
├── logs   
│   └── CAPjobs    
│       ├── spider1    
│       ├── spider2    
│       └── spider3 
└── project.egg-info    

Upvotes: 0

Related Questions