Reputation: 1053
For some reason I can't get the XPath right and grab a product image from Nordstrom.com, I'm using Scrapy, heres my code, the strPicture always comes out empty:
from scrapy.spider import Spider
from scrapy.selector import Selector
from bed2.items import bed2Item
import urlparse
class MySpider(Spider):
name = "bed2"
allowed_domains = ["nordstrom.com,"nordstromimage.com"]
start_urls = ["http://shop.nordstrom.com/c/bedding-home?origin=leftnav#category=b60175057&type=category&marketingslots=2&page=1&defaultsize3=&size=&width=&color=&price=&brand=&instoreavailability=false&lastfilter=&sizeFinderId=0&resultsmode=&segmentId=0&sort=newest&sortreverse=0"]
def parse(self, response):
hxs = Selector(response)
titles = hxs.xpath("//div[@class='fashion-item']")
items = []
for titles in titles[:1]:
item = bed2Item()
item ["strTitle"] = titles.xpath("div[2]/a[1]/text()").extract()
item ["strLink"] = urlparse.urljoin(response.url, titles.xpath("div[2]/a[1]/@href").extract()[0])
item ["strPrice"] = "0"
item ["strPicture"] = titles.xpath("a/div[1]/img/@src").extract()
items.append(item)
return items
The URL I'm scraping is:
Trying to get the first product.
Thanks
Upvotes: 1
Views: 729
Reputation: 38682
Looking at the source code, I cannot find the anchor tag you're looking for when retrieving the picture URL. Omit it. Further, there seems to happen some JavaScript magic after loading the page, the image URL is stored in @data-original
.
item ["strPicture"] = titles.xpath("div[1]/div/img/@data-original").extract()
Anyway, as there isn't any further picture in that div, why not just query .//img/@data-original
? Same applies to the title which could be queried exploiting its class attribute, like in .//a[@class='title']/text()
or even more robust data(.//a[@class='title'])
.
Upvotes: 2