Reputation: 26748
I am working scrapy framework below is my spider.py code
class Example(BaseSpider):
name = "example"
allowed_domains = {"http://www.example.com"}
start_urls = [
"http://www.example.com/servlet/av/search&SiteName=page1"
]
def parse(self, response):
hxs = HtmlXPathSelector(response)
hrefs = hxs.select('//table[@class="knxa"]/tr/td/a/@href').extract()
# href consists of all href tags and i am copying in to forwarding_hrefs by making them as a string
forwarding_hrefs = []
for i in hrefs:
forwarding_hrefs.append(i.encode('utf-8'))
return Request('http://www.example.com/servlet/av/search&SiteName=page2',
meta={'forwarding_hrefs': response.meta['forwarding_hrefs']},
callback=self.parseJob)
def parseJob(self, response):
print response,">>>>>>>>>>>"
Result:
2012-07-18 17:29:15+0530 [example] DEBUG: Crawled (200) <GET http://www.example.com/servlet/av/search&SiteName=page1> (referer: None)
2012-07-18 17:29:15+0530 [MemorialReqionalHospital] ERROR: Spider error processing <GET http://www.example.com/servlet/av/search&SiteName=page2>
Traceback (most recent call last):
File "/usr/lib64/python2.7/site-packages/twisted/internet/base.py", line 1167, in mainLoop
self.runUntilCurrent()
File "/usr/lib64/python2.7/site-packages/twisted/internet/base.py", line 789, in runUntilCurrent
call.func(*call.args, **call.kw)
File "/usr/lib64/python2.7/site-packages/twisted/internet/defer.py", line 361, in callback
self._startRunCallbacks(result)
File "/usr/lib64/python2.7/site-packages/twisted/internet/defer.py", line 455, in _startRunCallbacks
self._runCallbacks()
--- <exception caught here> ---
File "/usr/lib64/python2.7/site-packages/twisted/internet/defer.py", line 542, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/home/local/user/project/example/example/spiders/example_spider.py", line 36, in parse
meta={'forwarding_hrefs': response.meta['forwarding_hrefs']},
exceptions.KeyError: 'forwarding_hrefs'
What i am trying to do is i am collecting all the href tags from
http://www.example.com/servlet/av/search&SiteName=page1
and placing in to forward_hrefs
and calling this forward_hrefs
in the next request(want to use this forward_urls
list in the next method)
http://www.example.com/servlet/av/search&SiteName=page2
I want to also add the href tags from page2 in the forward_urls and loop in this forward_hrefs
and yield request of each href tag, this is my idea but it is showing error as above, whats wrong in the above code, actually meta tag is meant to copy the items.
Can anyone please let me know this how to copy forward_hrefs
list from parse
method to parseJob
method.
Finally my intension is to copy forward_hrefs
list from parse
method to parseJob
method.
hope i explained well sorry if not please let me know....
Thanks in advance
Upvotes: 0
Views: 2020
Reputation: 2733
Haven't tried anything but it seems you have an error here:
return Request('http://www.example.com/servlet/av/search&SiteName=page2',
meta={'forwarding_hrefs': response.meta['forwarding_hrefs']},
callback=self.parseJob)
You are passing response.meta['forwarding_hrefs'] but it dosn't exist for this response
You need to put:
return Request('http://www.example.com/servlet/av/search&SiteName=page2',
meta={'forwarding_hrefs': forwarding_hrefs},
callback=self.parseJob)
cause you have forwarding_hrefs field and this way you'll send it to parse job inside meta and then inside meta you'll be able to access response.meta['forwarding_hrefs'] cause it will exist in that response object.
Upvotes: 1