Reputation: 2114
What I do:
def parse(self, response):
products_urls = response.css('.product-item a::attr(href)').extract()
for product_url in product_urls:
yield Request(product_url, callback=self.parse_product)
print( "Continue doing stuff...." )
def parse_product(self, response):
title = response.css('h1::text').extract_first()
print( title )
}
In this example, the code will first output Continue doing stuff..
and after that it will print product titles. I would like it to run otherwise, first do requests and print titles, and only then print Continue doing stuff..
UPDATE: @Georgiy in comments asked if I require previously scraped product data.
Answer is yes, this is simplified example. After data is fetched I want to manipulate that data.
Upvotes: 2
Views: 279
Reputation: 23
Note: Can't comment due to lack of rep.
While the above code works for most cases, I suggest using self.crawler.stats
to keep track of count because manually decrementing count at higher concurrent requests might induce a race condition. Example code below.
def parse(self, response):
products_urls = response.css('.product-item a::attr(href)').extract()
self.count = len(products_urls)
self.crawler.stats.set_value('processed_product_pages', 0)
if self.count == 0:
self.onEnd()
else:
for product_url in product_urls:
yield Request(product_url, callback=self.parse_product)
def onEnd(self):
print( "Continue doing stuff...." )
def parse_product(self, response):
title = response.css('h1::text').extract_first()
print( title )
self.crawler.stats.inc_value('processed_product_pages')
if (self.count == self.crawler.stats.get_value('processed_product_pages', 0)):
self.onEnd()
Upvotes: 0
Reputation: 6298
You can move the logic to the parse_product
function. For example:
def parse(self, response):
products_urls = response.css('.product-item a::attr(href)').extract()
self.count = len(products_urls)
if self.count == 0:
self.onEnd()
else:
for product_url in product_urls:
yield Request(product_url, callback=self.parse_product)
def onEnd(self):
print( "Continue doing stuff...." )
def parse_product(self, response):
title = response.css('h1::text').extract_first()
print( title )
self.count -= 1
if (self.count == 0):
self.onEnd()
Upvotes: 3