Reputation: 6345
Suppose my start URLs are paired with an ID each, e.g., (http://www.foo.com, 53453)
. Is there a way to read the id in parse()
given a response (assuming response.url
is a start URL)? Is there a way to give response a custom 'payload'? I know I could do db lookups, but I wonder if it could be done in memory.
Thanks
Upvotes: 2
Views: 123
Reputation: 473873
Override start_requests()
method and yield
Request
instances passing id
inside the meta
dictionary:
class MySpider(Spider):
def start_requests(self):
items = get_url_and_ids_from_db()
for url, id in items:
yield Request(url, meta={'id': id})
def parse(self, response):
id = response.meta['id']
...
Alternatively, you can get the url->id mapping from the database in the __init__()
and get the id
by response.url
in the parse()
method:
class MySpider(Spider):
def __init__(self, *args, **kwargs):
super(MySpider, self).__init__(*args, **kwargs)
self.mapping = get_url_id_mapping_from_db()
self.start_urls = mapping.keys()
def parse(self, response):
id = self.mapping[response.url]
...
Upvotes: 2