Reputation: 2853
I'm trying to test out some XPaths using the Scrapy shell, but it seems to be calling on my incomplete spider module to do the scraping, which is not what I want. Is there a way to define which spider scrapy uses with its shell? Even more, why is Scrapy doing this; shouldn't it know the spider is not ready for use? That's why I'm using the shell right? Otherwise I'd be using
scrapy crawl spider_name
if I wanted to use a specific spider.
Edit: After looking at the Spider docs forever, I found the following description for the spider instance used in the shell.
spider - the Spider which is known to handle the URL, or a BaseSpider object if there is no spider found for the current URL
This means, scrapy has correlated the URL with my spider, and is using it instead of a BaseSpider. Unfortunately, my spider is not ready for testing, so is there a way to force it to use a BaseSpider for the shell instead?
Upvotes: 8
Views: 1906
Reputation: 6710
Scrapy automatically selects the spider based on the allowed_domains
attribute. If there are more than one spider for given domain Scrapy will use BaseSpider
.
But, it's just a python shell, you can instantiate any spider you want.
>>> from myproject.spiders.myspider import MySpider >>> spider = MySpider() >>> spider.parse_item(response)
Edit: as workaround to not use your spider you can set allowed_domains = []
Upvotes: 7
Reputation: 18529
You should modify your settings file to change DEFAULT_ITEM_CLASS
Per the docs :
The default class that will be used for instantiating items in the the Scrapy shell.
Upvotes: 1
Reputation: 18385
The shell isn't intended to be used with a spider:
You can try and debug your scraping code very quickly, without having to run the spider. ... [It] is used for testing XPath expressions.
Upvotes: 1