Stacking multiple Rules Scrapy for depth crawl

Question

Appreciate someone can help me understand how rules stack for depth crawling. Does Stacking multiple rules result in Rules being processed one at a time. The aim is to grab links from MainPage, return the items and the responses, and pass it to the next rule which will pass of the links to another function and so on.

 rules = {
        Rule(LinkExtractor(restrict_xpaths=(--some xpath--)), callback='function_a', follow=True)
        Rule(linkExtractor(restrict_xpaths=(--some xpath--)),callback='function_b', process_links='function_c', follow=True),
    )


def function_a(self, response): --grab sports, games, link3 from main page--
    item = ItemA()
    i = response.xpath('---some xpath---')
    for xpth in i:
        item['name'] = xpth('---some xpath--')
        yield item, scrapy.Request(url) // yield each item and url link from function_a back to the second rule

def function_b(self, response) -- receives responses from second rule--
     //grab links same as function_a

def function_c(self, response) -- does process_links in the rule send the links it received to function_c?

Can this be done recursively to deep crawl a single site? I'm not sure if I got the rules concept correct. Do I have to add X num of rules to process X depth pages or is there a better way to process recursive depth crawls.

Thanks

Stacking multiple Rules Scrapy for depth crawl

Answers (1)

Related Questions