Dionysian
Dionysian

Reputation: 1235

Scrapy: Passing item between methods

Suppose I have a Bookitem, I need to add information to it in both the parse phase and detail phase

def parse(self, response)
    data = json.loads(response)
    for book in data['result']:
        item = BookItem();
        item['id'] = book['id']
        url = book['url']
        yield Request(url, callback=self.detail)

def detail(self,response):        
    hxs = HtmlXPathSelector(response)
    item['price'] = ......
#I want to continue the same book item as from the for loop above

Using the code as is would led to undefined item in the detail phase. How can I pass the item to the detail? detail(self,response,item) doesn't seem to work.

Upvotes: 22

Views: 9940

Answers (3)

tbrk
tbrk

Reputation: 283

iMom0's approach still works, but as of scrapy 1.7, the recommended approach is to pass user-defined information through cb_kwargs and leave meta for middlewares, extensions, etc:

def parse(self, response):
   ....
   yield Request(url, callback=self.detail, cb_kwargs={'item': item})

def detail(self,response, item): 
  item['price'] = ......

You could also pass the individual key-values into the cb_kwargs argument and then only instantiate the BookItem instance in the final callback (detail in this case):

def parse(self, response)
    data = json.loads(response)
    for book in data['result']:
        yield Request(url, 
                      callback=self.detail, 
                      cb_kwargs=dict(id_=book['id'], 
                                     url=book['url']))

def detail(self,response, id_, url):        
    hxs = HtmlXPathSelector(response)
    item = BookItem()
    item['id'] = id_
    item['url'] = url
    item['price'] = ......

Upvotes: 4

iMom0
iMom0

Reputation: 12911

There is an argument named meta for Request:

yield Request(url, callback=self.detail, meta={'item': item})

then in function detail, access it this way:

item = response.meta['item']

See more details here about jobs topic.

Upvotes: 35

greg
greg

Reputation: 1416

You can define variable in init method:

class MySpider(BaseSpider):
    ...

    def __init__(self):
        self.item = None

    def parse(self, response)
        data = json.loads(response)
        for book in data['result']:
            self.item = BookItem();
            self.item['id'] = book['id']
            url = book['url']
            yield Request(url, callback=self.detail)

    def detail(self, response):        
        hxs = HtmlXPathSelector(response)
        self.item['price'] = ....

Upvotes: 2

Related Questions