Reputation: 161
I am trying to scrape some data from this url https://www.farfetch.com/shopping/men/gucci-white-rhyton-web-print-leather-sneakers-item-12889013.aspx?storeid=9359
The html looks something like this:
<div class="cdb2b6" id="bannerComponents-Container">
<p class="_41db0e _527bd9 eda00d" data-tstid="merchandiseTag">New Season</p>
<div class="_1c3e57">
<h1 class="_61cb2e" itemprop="brand" itemscope="" itemtype="http://schema.org/Brand">
<a href="/shopping/men/gucci/items.aspx" class="fd9e8e e484bf _4a941d f140b0" data-trk="pp_infobrd" data-tstid="cardInfo-title" itemprop="url" aria-label="Gucci">
<span itemprop="name">Gucci</span>
</a>
</h1>
</div>
</div>
I ran response.xpath('//div[@id="bannerComponents-Container"]/@class')
in the scrapy shell, but I all get is :
In [1]: response.xpath('//div[@id="bannerComponents-Container"]/@class')
Out[1]: []
Why? I am running into similar issues on Amazon, Ebay, etc. where my xpath selector doesnt seem to work
Upvotes: 0
Views: 400
Reputation: 22440
It's because of the headers
. Define one and get what you are after. Try the following: If you kick out the headers
the result becomes none.
import requests
from scrapy import Selector
LINK = 'https://www.farfetch.com/bd/shopping/men/gucci-white-rhyton-web-print-leather-sneakers-item-12889013.aspx?storeid=9359'
def get_item(url):
res = requests.get(url,headers={"User-Agent":"Mozilla/5.0"})
sel = Selector(res)
name = sel.xpath('//div[@id="bannerComponents-Container"]//span[@itemprop="name"]/text()').extract_first()
print(name)
if __name__ == '__main__':
get_item(LINK)
Output:
Gucci
Upvotes: 2