How to make the two CrawlerSpider Rules to cooperate

Question

I use this CrawlerSpider example as 'backbone' for my Crawler.

I want to implement this idea:

First Rule follows the links. Then matched links are passed further to second rule, where second rule matches new links according to the pattern and calls callback on them.

For example, i have Rules:

...

start_urls = ['http://play.google.com/store']

rules = (
    Rule(SgmlLinkExtractor(allow=('/store/apps',))),
    Rule(SgmlLinkExtractor(allow=('/details\?id=',)), callback='parse_app'),
)

...

How i expect that parser will work:

Open http://play.google.com/store' and matches first URL 'https://play.google.com/store/apps/category/SHOPPING/collection/topselling_free'
Pass found URL ('https://play.google.com/store/apps/category/SHOPPING/collection/topselling_free') to second Rule
Second Rule tries to match it's pattern (allow=('.*/details\?id=',))) and, if it's matched, calls callback 'parse_app' for that URL.

Atm, Crawler just walks through all links and doesn't parse anything.

How to make the two CrawlerSpider Rules to cooperate

Answers (1)

Related Questions