Scrapy send variable along with URL to spider

Question

I'm using https://github.com/rolando/scrapy-redis to create a spider that reads URLs form a Redis list. The problem I have is that I want to send a unique ID along side of each URL. So that I can identify the entry in the db again.

My list in redis looks like this: http://google.com[someuniqueid] http://example.com[anotheruniqueid]

Scrapy-redis per default reads only an url from redis that then is sent to the spider.

I modified inside: https://github.com/rolando/scrapy-redis/blob/master/scrapy_redis/spiders.py

And changed this function:

def next_request(self):
    """Returns a request to be scheduled or none."""
    url = self.server.lpop(self.redis_key)
    if url:
        mm = url.split("[")
        self.guid = mm[1].replace("]", "")
        return self.make_requests_from_url(mm[0])

This works, I can get the guid inside my spider by calling:

print self.guid

The problem however is that it seems to mix up the guid's. I dont always have the correct guid for each url.

How should I send the guid to my spider?

Scrapy send variable along with URL to spider

Answers (1)

Related Questions