Kar
Kar

Reputation: 6345

How to load start URLs with parameters?

Suppose my start URLs are paired with an ID each, e.g., (http://www.foo.com, 53453). Is there a way to read the id in parse() given a response (assuming response.url is a start URL)? Is there a way to give response a custom 'payload'? I know I could do db lookups, but I wonder if it could be done in memory.

Thanks

Upvotes: 2

Views: 123

Answers (1)

alecxe
alecxe

Reputation: 473873

Override start_requests() method and yield Request instances passing id inside the meta dictionary:

class MySpider(Spider):
    def start_requests(self):
        items = get_url_and_ids_from_db()
        for url, id in items:
            yield Request(url, meta={'id': id})

    def parse(self, response):
        id = response.meta['id']
        ...

Alternatively, you can get the url->id mapping from the database in the __init__() and get the id by response.url in the parse() method:

class MySpider(Spider):
    def __init__(self, *args, **kwargs):
        super(MySpider, self).__init__(*args, **kwargs)

        self.mapping = get_url_id_mapping_from_db()

        self.start_urls = mapping.keys()

    def parse(self, response):
        id = self.mapping[response.url]
        ...

Upvotes: 2

Related Questions