Sam T
Sam T

Reputation: 1053

Cant get image with xpath

For some reason I can't get the XPath right and grab a product image from Nordstrom.com, I'm using Scrapy, heres my code, the strPicture always comes out empty:

from scrapy.spider import Spider
from scrapy.selector import Selector
from bed2.items import bed2Item
import urlparse

class MySpider(Spider):
    name = "bed2"
    allowed_domains = ["nordstrom.com,"nordstromimage.com"]
    start_urls = ["http://shop.nordstrom.com/c/bedding-home?origin=leftnav#category=b60175057&type=category&marketingslots=2&page=1&defaultsize3=&size=&width=&color=&price=&brand=&instoreavailability=false&lastfilter=&sizeFinderId=0&resultsmode=&segmentId=0&sort=newest&sortreverse=0"]

def parse(self, response):
    hxs = Selector(response)
    titles = hxs.xpath("//div[@class='fashion-item']")
    items = []
    for titles in titles[:1]:
        item = bed2Item()
        item ["strTitle"] = titles.xpath("div[2]/a[1]/text()").extract()
        item ["strLink"] = urlparse.urljoin(response.url, titles.xpath("div[2]/a[1]/@href").extract()[0])
        item ["strPrice"] = "0"
        item ["strPicture"] = titles.xpath("a/div[1]/img/@src").extract()
        items.append(item)
    return items

The URL I'm scraping is:

http://shop.nordstrom.com/c/bedding-home?origin=leftnav#category=b60175057&type=category&marketingslots=2&page=1&defaultsize3=&size=&width=&color=&price=&brand=&instoreavailability=false&lastfilter=&sizeFinderId=0&resultsmode=&segmentId=0&sort=newest&sortreverse=0

Trying to get the first product.

Thanks

Upvotes: 1

Views: 729

Answers (1)

Jens Erat
Jens Erat

Reputation: 38682

Looking at the source code, I cannot find the anchor tag you're looking for when retrieving the picture URL. Omit it. Further, there seems to happen some JavaScript magic after loading the page, the image URL is stored in @data-original.

item ["strPicture"] = titles.xpath("div[1]/div/img/@data-original").extract()

Anyway, as there isn't any further picture in that div, why not just query .//img/@data-original? Same applies to the title which could be queried exploiting its class attribute, like in .//a[@class='title']/text() or even more robust data(.//a[@class='title']).

Upvotes: 2

Related Questions