Reputation: 2853

How do define which spider the scrapy shell uses?

I'm trying to test out some XPaths using the Scrapy shell, but it seems to be calling on my incomplete spider module to do the scraping, which is not what I want. Is there a way to define which spider scrapy uses with its shell? Even more, why is Scrapy doing this; shouldn't it know the spider is not ready for use? That's why I'm using the shell right? Otherwise I'd be using

scrapy crawl spider_name

if I wanted to use a specific spider.

Edit: After looking at the Spider docs forever, I found the following description for the spider instance used in the shell.

spider - the Spider which is known to handle the URL, or a BaseSpider object if there is no spider found for the current URL

This means, scrapy has correlated the URL with my spider, and is using it instead of a BaseSpider. Unfortunately, my spider is not ready for testing, so is there a way to force it to use a BaseSpider for the shell instead?

Upvotes: 8

Answers (3)

R. Max

Reputation: 6710

Scrapy automatically selects the spider based on the allowed_domains attribute. If there are more than one spider for given domain Scrapy will use BaseSpider.

But, it's just a python shell, you can instantiate any spider you want.

>>> from myproject.spiders.myspider import MySpider
>>> spider = MySpider()
>>> spider.parse_item(response)

Edit: as workaround to not use your spider you can set allowed_domains = []

Upvotes: 7

user

Reputation: 18529

You should modify your settings file to change DEFAULT_ITEM_CLASS

Per the docs :

The default class that will be used for instantiating items in the the Scrapy shell.

Upvotes: 1

Tim McNamara

Reputation: 18385

The shell isn't intended to be used with a spider:

You can try and debug your scraping code very quickly, without having to run the spider. ... [It] is used for testing XPath expressions.

Upvotes: 1

How do define which spider the scrapy shell uses?

Answers (3)

Related Questions