BugHunterUK
BugHunterUK

Reputation: 8958

Scrapy - TypeError: 'Request' object is not iterable

I'm trying to iterate through a list of URL's return from the callback passed to scrapy request, but I'm getting the following error:

TypeError: 'Request' object is not iterable

The following works. I can see all the extracted URL's flood the terminal:

import scrapy

class PLSpider(scrapy.Spider):
    name = 'pl'
    start_urls = [ 'https://example.com' ]

    def genres(self, resp):
        for genre in resp.css('div.sub-menus a'):
            yield {
                'genre': genre.css('::text').extract_first(),
                'url': genre.css('::attr(href)').extract_first() 
            }

    def extractSamplePackURLs(self, resp):
        return {
            'packs': resp.css('h4.product-title a::attr(href)').extract()
        }

    def extractPackData(self, resp):
        return {
            'title': resp.css('h1.product-title::text'),
            'description': resp.css('div.single-product-description p').extract_first()
        }

    def parse(self, resp):
        for genre in self.genres(resp):
            samplePacks = scrapy.Request(genre['url'], callback=self.extractSamplePackURLs)
            yield samplePacks

But if I replace the yield samplePacks line with:

    def parse(self, resp):
        for genre in self.genres(resp):
            samplePacks = scrapy.Request(genre['url'], callback=self.extractSamplePackURLs)
            for pack in samplePacks:
                yield pack

... I get the error I posted above.

Why is this and how can I loop through the returned value of the callback?

Upvotes: 1

Views: 3072

Answers (1)

paul trmbrth
paul trmbrth

Reputation: 20748

Yielding Request objects in scrapy.Spider callbacks only tells Scrapy framework to enqueue HTTP requests. It yields HTTP requests objects, just that. It does not download them immediately. Or give back control until they are downloaded, ie. after the yield, you still don't have the result. Request objects are not promises, futures, deferred. Scrapy is not designed the same as various async frameworks.

These Request objects will eventually get processed by the framework's downloader, and the response body from each HTTP request will be passed to the associated callback. This is the basis of Scrapy's asynchronous programming pattern.

If you want to do something more "procedural-like" in which yield request(...) gets you the HTTP response the next time you have control, you can have a look at https://github.com/rmax/scrapy-inline-requests/.

Upvotes: 3

Related Questions