python scrapy Two-direction crawling with a spider

Question

I am reading learning scrapy by Dimitrios Kouzis-Loukas. Actually I have a question of the Two-direction crawling with a spider part in chapter 3 page58.

The original code is like:

def parse(self, response):
# Get the next index URLs and yield Requests
    next_selector = response.xpath('//*[contains(@class,"next")]//@href')
    for url in next_selector.extract():
        yield Request(urlparse.urljoin(response.url, url))

# Get item URLs and yield Requests
    item_selector = response.xpath('//*[@itemprop="url"]/@href')
    for url in item_selector.extract():
        yield Request(urlparse.urljoin(response.url, url), 
      callback=self.parse_item)`

But from my understanding, should the second loop block be included into the first one so that we can first download the index page and then download all the information pages in the first page, after that move onto the next index page?

So I just wanna know the operating order of the original code, please help!

python scrapy Two-direction crawling with a spider

Answers (1)

Related Questions