rhctomoefjklioizsx
rhctomoefjklioizsx

Reputation: 61

Scrapy getting values from other links

def parse(self, response):
    list = {
        "name": response.css("#title::text").extract_first(),
        "images": []
    }

    for image in response.css("#images_link a::attr(href)").extract():
        list["images"].append(yield scrapy.Request(url=image, callback=self.parse_image))
        
    yield list
    
def parse_image(self, response):
    return [response.css("img::attr(alt)").extract(), response.css("img::attr(src)").extract()]

i wanna to scrap a page and also scrape some of its children links and append them to the main object of list but it returns many many objects back instead of one

how to do it right?

Upvotes: 0

Views: 58

Answers (1)

SuperUser
SuperUser

Reputation: 4822

It's better if you rename "list" to something else.

Request doesn't work like that, you'll need to pass the item ("dict1" in the example).

Here's an example (since you didn't provide the website's url I couldn't check it, but I hope you get the point):

import scrapy


class ExampleSpider(scrapy.Spider):
    name = "exampleSpider"
    start_urls = ['https://www.example.com']

    def parse(self, response):
        dict1 = {
            "name": response.css("#title::text").get(),
            "images": []
        }

        images_urls = response.css("#images_link a::attr(href)").getall()

        if images_urls:
            scrapy.Request(url=images_urls[0],
                           callback=self.parse_image,
                           cb_kwargs={'dict1': dict1, 'images_urls': images_urls[1:]})

    def parse_image(self, response, dict1, images_urls):
        if images_urls:
            dict1['images'].append([response.css("img::attr(alt)").getall(), response.css("img::attr(src)").getall()])
            scrapy.Request(url=images_urls[0],
                           callback=self.parse_image,
                           cb_kwargs={'dict1': dict1, 'images_urls': images_urls[1:]})
        else:
            yield dict1

Read these: request objects, passing additional data to callback functions

Upvotes: 1

Related Questions