tomgal
tomgal

Reputation: 136

Scrapy call request in a loop

I want to scrap a web page which contains comobobox with filtering options. The base url is the same but the request payload depends on selected combobox value. I have a list of available options and I've created a loop which iterate over the combobox values and execute request. Code below:

def parse_product_lines(self, response):
    options = json.loads(response.body_as_unicode())
    product_lines = options['products']

    for product_line in product_lines:
        payload = self.prepare_payload(product_line)

        scrapy.Request('http://example.com',
                       method="POST",
                       body=urllib.urlencode(payload),
                       callback=self.parse_items)

def parse_items(self, response):
    print response

,but requests are not executed. Do somebody know what's going on there ?

Upvotes: 1

Views: 3272

Answers (2)

eLRuLL
eLRuLL

Reputation: 18799

Scrapy doesn't wait for a Request to finish (like other requests libraries), it calls requests asychronously.

These requests (and items) are being handled by every method that handles those requests (callbacks) by yielding them, because scrapy takes those methods as generators and checks if they are items or requests, returning the items, and scheduling the request to be handled later by the method specified on its callback parameter.

So don't just call Request, but yield Request to be scheduled by scrapy.

Upvotes: 3

zephor
zephor

Reputation: 747

first, a Spider class use method parse by default.

each callback should return an Item or a dict, or an iterator.

you should yield request in your parse_product_lines method to tell scrapy to handle next.

Upvotes: 5

Related Questions