darbulix
darbulix

Reputation: 161

xpath selector return null in scrapy shell

I am trying to scrape some data from this url https://www.farfetch.com/shopping/men/gucci-white-rhyton-web-print-leather-sneakers-item-12889013.aspx?storeid=9359

The html looks something like this:

<div class="cdb2b6" id="bannerComponents-Container">
  <p class="_41db0e _527bd9 eda00d" data-tstid="merchandiseTag">New Season</p>
  <div class="_1c3e57">
    <h1 class="_61cb2e" itemprop="brand" itemscope="" itemtype="http://schema.org/Brand">
    <a href="/shopping/men/gucci/items.aspx" class="fd9e8e e484bf _4a941d f140b0" data-trk="pp_infobrd" data-tstid="cardInfo-title" itemprop="url" aria-label="Gucci">
    <span itemprop="name">Gucci</span>
    </a>
    </h1>
  </div>
 </div>

I ran response.xpath('//div[@id="bannerComponents-Container"]/@class') in the scrapy shell, but I all get is :

In [1]: response.xpath('//div[@id="bannerComponents-Container"]/@class')
Out[1]: []

Why? I am running into similar issues on Amazon, Ebay, etc. where my xpath selector doesnt seem to work

Upvotes: 0

Views: 400

Answers (1)

SIM
SIM

Reputation: 22440

It's because of the headers. Define one and get what you are after. Try the following: If you kick out the headers the result becomes none.

import requests
from scrapy import Selector

LINK = 'https://www.farfetch.com/bd/shopping/men/gucci-white-rhyton-web-print-leather-sneakers-item-12889013.aspx?storeid=9359'

def get_item(url):
    res = requests.get(url,headers={"User-Agent":"Mozilla/5.0"})
    sel = Selector(res)
    name = sel.xpath('//div[@id="bannerComponents-Container"]//span[@itemprop="name"]/text()').extract_first()
    print(name)

if __name__ == '__main__':
    get_item(LINK)

Output:

Gucci

Upvotes: 2

Related Questions