Reputation: 2154
I try to get the product rating information from target.com. The URL for the product is
After looking through response.body, I find out that the rating information is not statically loaded. So I need to get using other ways. I find some similar questions saying in order to get dynamic data, I need to
I am stuck at step 2 right now, i find out that one XHR named 15258543 contained rating distribution, but I don't know how can I sent a request to get the json. Like to where and use what parameter.
Can someone can walk me through this? Thank you!
Upvotes: 1
Views: 258
Reputation: 473863
The trickiest thing is to get that 15258543
product ID dynamically and then use it inside the URL to get the reviews. This product ID can be found in multiple places on the product page, for instance, there is a meta
element that we can use:
<meta itemprop="productID" content="15258543">
Here is a working spider that makes a separate GET request to get the reviews, loads the JSON response via json.loads()
and prints the overall product rating:
import json
import scrapy
class TargetSpider(scrapy.Spider):
name = "target"
allowed_domains = ["target.com"]
start_urls = ["http://www.target.com/p/bounty-select-a-size-paper-towels-white-8-huge-rolls/-/A-15258543#prodSlot=medium_1_4&term=bounty"]
def parse(self, response):
product_id = response.xpath("//meta[@itemprop='productID']/@content").extract_first()
return scrapy.Request("http://tws.target.com/productservice/services/reviews/v1/reviewstats/" + product_id,
callback=self.parse_ratings,
meta={"product_id": product_id})
def parse_ratings(self, response):
data = json.loads(response.body)
print(data["result"][response.meta["product_id"]]["coreStats"]["AverageOverallRating"])
Prints 4.5585
.
Upvotes: 2