Reputation: 113
i am new in python & scrapy. i tried to run existing code, but i got this error on every address:
> 2015-07-02 01:52:19 [scrapy] DEBUG: Crawled (200) <GET http://www.tripadvisor.com/ShowUserReviews-g187147-d197524-r281927613-Hotel_Mirific_Opera-Paris_Ile_de_France.html>
> (referer:
> http://www.tripadvisor.com/Hotel_Review-g187147-d197524-Reviews-Hotel_Mirific_Opera-Paris_Ile_de_France.html)2015-07-02
> 01:52:19
> [scrapy] ERROR: Spider error processing <GET http://www.tripadvisor.com/ShowUserReviews-g187147-d197524-r281927613-Hotel_Mirific_Opera-Paris_Ile_de_France.html>
> (referer:
> http://www.tripadvisor.com/Hotel_Review-g187147-d197524-Reviews-Hotel_Mirific_Opera-Paris_Ile_de_France.html)
>
> Traceback (most recent call last): File
> "/usr/local/lib/python2.7/dist-packages/scrapy/utils/defer.py", line
> 102, in iter_errback
> yield next(it) File "/usr/local/lib/python2.7/dist-packages/scrapy/spidermiddlewares/offsite.py",
> line 28, in process_spider_output
> for x in result: File "/usr/local/lib/python2.7/dist-packages/scrapy/spidermiddlewares/referer.py",
> line 22, in <genexpr>
> return (_set_referer(r) for r in result or ()) File "/usr/local/lib/python2.7/dist-packages/scrapy/spidermiddlewares/urllength.py",
> line 37, in <genexpr>
> return (r for r in result or () if _filter(r)) File "/usr/local/lib/python2.7/dist-packages/scrapy/spidermiddlewares/depth.py",
> line 54, in <genexpr>
> return (r for r in result or () if _filter(r)) File "/usr/local/lib/python2.7/dist-packages/scrapy/spiders/crawl.py", line
> 67, in _parse_response
> cb_res = callback(response, **cb_kwargs) or () File "/home/talmosko/Documents/scrapy/tripAdvisor/spiders/tripAdvisor.py",
> line 30, in parse_item
> item['state'] = hxs.xpath('//*[@id="PAGE"]/div[2]/div[1]/ul/li[2]/a/span/text()').extract()[0].encode('ascii',
> errors='ignore')
>
> IndexError: list index out of range
this is my code: http://pastebin.com/XzM5DrDD
What is the problem? it seems like the spide didnt get an answer..
Thanks!
Upvotes: 3
Views: 19678
Reputation: 4511
You are trying to access an element that doesn't exist, the error is in this line
item['state'] = hxs.xpath('//*[@id="PAGE"]/div[2]/div[1]/ul/li[2]/a/span/text()').extract()[0].encode('ascii', errors='ignore')
Problably
item['state'] = hxs.xpath('//*[@id="PAGE"]/div[2]/div[1]/ul/li[2]/a/span/text()').extract()
is empty and you are trying to access the first element. You have two options:
Upvotes: 2