Spaceman
Spaceman

Reputation: 1205

How to pass some information between parse_item calls?

Ok, imagine a website with some list. The items of this list have one piece of information needed. The second piece is located at some other url, which is unique from item to item.

Currently our crawler opens a list page, scrapes each item, and for each item it opens that 2nd URL and gets the 2nd piece of the info from there. We use requests lib which is excellent in almost all cases but now it seems to be slow and ineffective. It looks that the whole Twisted is being blocked until one 'requests' request ends.

pseudo-code:

def parse_item():
    for item in item_list:
        content2 = requests.get(item['url'])

We can't just let Scrapy parse these 2nd urls because we need to 'connect' the first and the second url somehow. Something like Redis would work, but hey, is there any better (simpler, faster) way to do that in Scrapy? I can't believe the things must be so complicated.

Upvotes: 1

Views: 46

Answers (1)

Krishna Sunuwar
Krishna Sunuwar

Reputation: 2945

You can do this my passing variable in meta

For example:

req = Request(url=http://somedonain.com/path, callback=myfunc)
req.meta['var1'] = 'some value'

yeld(req)

And in ur myfunc, you read passed variable as:

myval = response.request.meta['var1']

Upvotes: 2

Related Questions