Reputation: 65
I'm getting error with processing URL with scrapy 1.5.0, python 2.7.14.
class FootLockerSpider(Spider):
name = "FootLockerSpider"
allowded_domains = ["footlocker.it"]
start_urls = [FootLockerURL]
def __init__(self):
logging.critical("FootLockerSpider STARTED.")
def parse(self, response):
products = Selector(response).xpath('//div[@class="fl-category--productlist"]')
for product in products:
item = FootLockerItem()
item['name'] = product.xpath('.//a/span[@class="fl-product-tile--name"]/span').extract()[0]
item['link'] = product.xpath('.//a/@href').extract()[0]
# item['image'] = product.xpath('.//div/a/div/img/@data-original').extract()[0]
# item['size'] = '**NOT SUPPORTED YET**'
yield item
yield Request(FootLockerURL, callback=self.parse, dont_filter=True, priority=14)
This is my class FootLockerSpider, and this is the error I get:
[scrapy.core.scraper] ERROR: Spider error processing <GET
https://www.footlocker.it/it/uomo/scarpe/> (referer: None)
File "C:\Users\Traian\Downloads\Sneaker-Notify\main\main.py", line 484, in
parse item['name'] = product.xpath('.//a/span[@class="fl-product-tile--
name"]/span').extract()[0]
IndexError: list index out of range
How can I solve this problem?
Upvotes: 0
Views: 605
Reputation: 10666
You need to always check source HTML:
<div class="fl-category--productlist--item" data-category-item><div class="fl-load-animation fl-product-tile--container"
data-lazyloading
data-lazyloading-success-handler="lazyloadingInit"
data-lazyloading-context="product-tile"
data-lazyloading-content-handler="lazyloadingJSONContentHandler"
data-request="https://www.footlocker.it/INTERSHOP/web/WFS/Footlocker-Footlocker_IT-Site/it_IT/-/EUR/ViewProductTile-ProductTileJSON?BaseSKU=314213410104&ShowRating=true&ShowQuickBuy=true&ShowOverlay=true&ShowBadge=true"
data-scroll-to-target="fl-product-tile-314213410104"
>
<noscript>
<a href="https://www.footlocker.it/it/p/nike-air-max-97-ultra-17-uomo-scarpe-46994?v=314213410104"><span itemprop="name">Nike Air Max 97 Ultra '17 - Uomo Scarpe</span></a>
</noscript>
</div>
</div>
This will work:
products = response.xpath('//div[@class="fl-category--productlist--item"]')
for product in products:
item = FootLockerItem()
item['name'] = product.xpath('.//a/span/text()').extract_first()
item['link'] = product.xpath('.//a/@href').extract_first()
yield item
Upvotes: 1