Tapasweni Pathak
Tapasweni Pathak

Reputation: 562

Xpath is not extracting what it should extract

I am using Scrapy to parse a website. This is one product link.

The xpaths that I have tried to extract prices of products are:

sel.xpath ('//div[@class="product-price"]/input/div[@id="product_price"]/text()').extract()
sel.xpath ('//div[@id="product_price"]/text()').extract()
sel.xpath ('//div[@class="product-size-qua-info"]/div[@class="product-price"]/input/div[@id="product_price"]/text()').extract()
sel.xpath ('//div[@class="product-size-qua-info"]/div/input/div[@id="product_price"]/text()').extract()
sel.xpath ('//div[@class="product-size-qua-info"]/div/input/div/text()').extract()
sel.xpath ('//div[@class="product-size-qua-info"]/div/div/text()').extract()
sel.xpath ('//div[@class="product-size-qua-info"]/div//div/text()').extract()
sel.xpath ('//div[@class="product-size-qua-info"]/div[2]/text()').extract()
sel.xpath ('//div[@class="product-size-qua-info"]/div[2]//text()').extract()
sel.xpath ('//div[@id="product_price"]//text()').extract()

None of them is working. Some are just random tries.

What is the correct xpath to extract price of product from the url.?

Upvotes: 1

Views: 153

Answers (2)

Arthur Burkhardt
Arthur Burkhardt

Reputation: 700

The problem here is that the price and the size are retrieved by a javascript function. That explains why you don't see it in the response, but you see it in the DOM in your browser. This is not a scrapy specific issue.

Since this website relies heavily on javascript, browse the page code source instead of inspecting elements with firebug or chrome developer tools. Although it's perfectly feasible (and more efficient) to parse this website with scrapy, you could use Selenium, which supports javascript.

To get the price and size, you have to perform two additional POST requests to
http://www.goodearth.in/Wishlist.ashx, with the following parameters:

size: ACTION=CheckInventoryforSizes&ProductID=2060&VariantID=2060&Sizes=&ChosenColor=FFFFFF-Multi&isProductDetails=true

price: ACTION=GetProductPrice&ProductID=2060&VariantID=2060&ChosenSize=&ChosenColor=FFFFFF-Multi&View=productdetail

Upvotes: 1

Karl M.W.
Karl M.W.

Reputation: 747

By the look of it, pricing is always contained in a single div with id=product_price.

It also looks well written in that there is no duplicate id=product_price on the product pages

You can therefore just simply use:

//div[@id='product_price']/text()

What happened when you tried sel.xpath ('//div[@id="product_price"]/text()').extract()? This should be the correct pattern, the only difference being that I switched my single & double quotes.

It may seem a trivial change, but try:

sel.xpath ("//div[@id='product_price']/text()").extract()

Upvotes: 0

Related Questions