Shiva Krishna Bavandla
Shiva Krishna Bavandla

Reputation: 26748

Meta tag is not working in scrapy python

I am working scrapy framework below is my spider.py code

class Example(BaseSpider):
    name = "example"
    allowed_domains = {"http://www.example.com"}


start_urls = [
    "http://www.example.com/servlet/av/search&SiteName=page1"

]

def parse(self, response):
    hxs = HtmlXPathSelector(response)
    hrefs = hxs.select('//table[@class="knxa"]/tr/td/a/@href').extract()
    # href consists of all href tags and i am copying in to forwarding_hrefs by making them as a string 
    forwarding_hrefs = []
    for i in hrefs:
        forwarding_hrefs.append(i.encode('utf-8'))
    return Request('http://www.example.com/servlet/av/search&SiteName=page2',
                    meta={'forwarding_hrefs': response.meta['forwarding_hrefs']},
                   callback=self.parseJob)    


def parseJob(self, response):
    print response,">>>>>>>>>>>"

Result:

2012-07-18 17:29:15+0530 [example] DEBUG: Crawled (200) <GET http://www.example.com/servlet/av/search&SiteName=page1> (referer: None)
2012-07-18 17:29:15+0530 [MemorialReqionalHospital] ERROR: Spider error processing <GET http://www.example.com/servlet/av/search&SiteName=page2>
    Traceback (most recent call last):
      File "/usr/lib64/python2.7/site-packages/twisted/internet/base.py", line 1167, in mainLoop
        self.runUntilCurrent()
      File "/usr/lib64/python2.7/site-packages/twisted/internet/base.py", line 789, in runUntilCurrent
        call.func(*call.args, **call.kw)
      File "/usr/lib64/python2.7/site-packages/twisted/internet/defer.py", line 361, in callback
        self._startRunCallbacks(result)
      File "/usr/lib64/python2.7/site-packages/twisted/internet/defer.py", line 455, in _startRunCallbacks
        self._runCallbacks()
    --- <exception caught here> ---
      File "/usr/lib64/python2.7/site-packages/twisted/internet/defer.py", line 542, in _runCallbacks
        current.result = callback(current.result, *args, **kw)
      File "/home/local/user/project/example/example/spiders/example_spider.py", line 36, in parse
        meta={'forwarding_hrefs': response.meta['forwarding_hrefs']},
    exceptions.KeyError: 'forwarding_hrefs'

What i am trying to do is i am collecting all the href tags from

http://www.example.com/servlet/av/search&SiteName=page1 

and placing in to forward_hrefs and calling this forward_hrefs in the next request(want to use this forward_urls list in the next method)

http://www.example.com/servlet/av/search&SiteName=page2

I want to also add the href tags from page2 in the forward_urls and loop in this forward_hrefs and yield request of each href tag, this is my idea but it is showing error as above, whats wrong in the above code, actually meta tag is meant to copy the items. Can anyone please let me know this how to copy forward_hrefs list from parse method to parseJob method.

Finally my intension is to copy forward_hrefs list from parse method to parseJob method.

hope i explained well sorry if not please let me know....

Thanks in advance

Upvotes: 0

Views: 2020

Answers (1)

iblazevic
iblazevic

Reputation: 2733

Haven't tried anything but it seems you have an error here:

 return Request('http://www.example.com/servlet/av/search&SiteName=page2',
                meta={'forwarding_hrefs': response.meta['forwarding_hrefs']},
                callback=self.parseJob)    

You are passing response.meta['forwarding_hrefs'] but it dosn't exist for this response

You need to put:

 return Request('http://www.example.com/servlet/av/search&SiteName=page2',
                meta={'forwarding_hrefs': forwarding_hrefs},
                callback=self.parseJob)  

cause you have forwarding_hrefs field and this way you'll send it to parse job inside meta and then inside meta you'll be able to access response.meta['forwarding_hrefs'] cause it will exist in that response object.

Upvotes: 1

Related Questions