Reputation: 61
def parse(self, response):
list = {
"name": response.css("#title::text").extract_first(),
"images": []
}
for image in response.css("#images_link a::attr(href)").extract():
list["images"].append(yield scrapy.Request(url=image, callback=self.parse_image))
yield list
def parse_image(self, response):
return [response.css("img::attr(alt)").extract(), response.css("img::attr(src)").extract()]
i wanna to scrap a page and also scrape some of its children links and append them to the main object of list
but it returns many many objects back instead of one
how to do it right?
Upvotes: 0
Views: 58
Reputation: 4822
It's better if you rename "list" to something else.
Request doesn't work like that, you'll need to pass the item ("dict1" in the example).
Here's an example (since you didn't provide the website's url I couldn't check it, but I hope you get the point):
import scrapy
class ExampleSpider(scrapy.Spider):
name = "exampleSpider"
start_urls = ['https://www.example.com']
def parse(self, response):
dict1 = {
"name": response.css("#title::text").get(),
"images": []
}
images_urls = response.css("#images_link a::attr(href)").getall()
if images_urls:
scrapy.Request(url=images_urls[0],
callback=self.parse_image,
cb_kwargs={'dict1': dict1, 'images_urls': images_urls[1:]})
def parse_image(self, response, dict1, images_urls):
if images_urls:
dict1['images'].append([response.css("img::attr(alt)").getall(), response.css("img::attr(src)").getall()])
scrapy.Request(url=images_urls[0],
callback=self.parse_image,
cb_kwargs={'dict1': dict1, 'images_urls': images_urls[1:]})
else:
yield dict1
Read these: request objects, passing additional data to callback functions
Upvotes: 1