Reputation: 136
I want to scrap a web page which contains comobobox with filtering options. The base url is the same but the request payload depends on selected combobox value. I have a list of available options and I've created a loop which iterate over the combobox values and execute request. Code below:
def parse_product_lines(self, response):
options = json.loads(response.body_as_unicode())
product_lines = options['products']
for product_line in product_lines:
payload = self.prepare_payload(product_line)
scrapy.Request('http://example.com',
method="POST",
body=urllib.urlencode(payload),
callback=self.parse_items)
def parse_items(self, response):
print response
,but requests are not executed. Do somebody know what's going on there ?
Upvotes: 1
Views: 3272
Reputation: 18799
Scrapy doesn't wait for a Request
to finish (like other requests libraries), it calls requests asychronously.
These requests (and items) are being handled by every method that handles those requests (callbacks) by yielding them, because scrapy takes those methods as generators
and checks if they are items or requests, returning the items, and scheduling the request to be handled later by the method specified on its callback
parameter.
So don't just call Request
, but yield Request
to be scheduled by scrapy.
Upvotes: 3
Reputation: 747
first, a Spider
class use method parse
by default.
each callback should return an Item
or a dict
, or an iterator.
you should yield request
in your parse_product_lines
method to tell scrapy to handle next.
Upvotes: 5