Cant get image with xpath

Question

For some reason I can't get the XPath right and grab a product image from Nordstrom.com, I'm using Scrapy, heres my code, the strPicture always comes out empty:

from scrapy.spider import Spider
from scrapy.selector import Selector
from bed2.items import bed2Item
import urlparse

class MySpider(Spider):
    name = "bed2"
    allowed_domains = ["nordstrom.com,"nordstromimage.com"]
    start_urls = ["http://shop.nordstrom.com/c/bedding-home?origin=leftnav#category=b60175057&type=category&marketingslots=2&page=1&defaultsize3=&size=&width=&color=&price=&brand=&instoreavailability=false&lastfilter=&sizeFinderId=0&resultsmode=&segmentId=0&sort=newest&sortreverse=0"]

def parse(self, response):
    hxs = Selector(response)
    titles = hxs.xpath("//div[@class='fashion-item']")
    items = []
    for titles in titles[:1]:
        item = bed2Item()
        item ["strTitle"] = titles.xpath("div[2]/a[1]/text()").extract()
        item ["strLink"] = urlparse.urljoin(response.url, titles.xpath("div[2]/a[1]/@href").extract()[0])
        item ["strPrice"] = "0"
        item ["strPicture"] = titles.xpath("a/div[1]/img/@src").extract()
        items.append(item)
    return items

The URL I'm scraping is:

http://shop.nordstrom.com/c/bedding-home?origin=leftnav#category=b60175057&type=category&marketingslots=2&page=1&defaultsize3=&size=&width=&color=&price=&brand=&instoreavailability=false&lastfilter=&sizeFinderId=0&resultsmode=&segmentId=0&sort=newest&sortreverse=0

Trying to get the first product.

Thanks

Jens Erat · Accepted Answer

Looking at the source code, I cannot find the anchor tag you're looking for when retrieving the picture URL. Omit it. Further, there seems to happen some JavaScript magic after loading the page, the image URL is stored in @data-original.

item ["strPicture"] = titles.xpath("div[1]/div/img/@data-original").extract()

Anyway, as there isn't any further picture in that div, why not just query .//img/@data-original? Same applies to the title which could be queried exploiting its class attribute, like in .//a[@class='title']/text() or even more robust data(.//a[@class='title']).

Cant get image with xpath

Answers (1)

Related Questions