imns
imns

Reputation: 5082

Scrapy Django Limit links crawled

I just got scrapy setup and running and it works great, but I have two (noob) questions. I should say first that I am totally new to scrapy and spidering sites.

  1. Can you limit the number of links crawled? I have a site that doesn't use pagination and just lists a lot of links (which I crawl) on their home page. I feel bad crawling all of those links when I really just need to crawl the first 10 or so.

  2. How do you run multiple spiders at once? Right now I am using the command scrapy crawl example.com, but I also have spiders for example2.com and example3.com. I would like to run all of my spiders using one command. Is this possible?

Upvotes: 6

Views: 1230

Answers (2)

jsh
jsh

Reputation: 1995

Credit goes to Shane, here https://groups.google.com/forum/?fromgroups#!topic/scrapy-users/EyG_jcyLYmU

Using a CloseSpider should allow you to specify limits of this sort.

http://doc.scrapy.org/en/latest/topics/extensions.html#module-scrapy.contrib.closespider

Haven't tried it yet since I didn't need it. Looks like you also might have to enable as an extension (see top of same page) in your settings file.

Upvotes: 1

Jet Guo
Jet Guo

Reputation: 243

for #1: Don't use rules attribute to extract links and follow, write your rule in parse function and yield or return Requests object.

for #2: Try scrapyd

Upvotes: 2

Related Questions