Reputation: 483
How to scrape the list of product from this page with scrapy?
I have tried the ajax request url the browser sends:
https://www.amazon.cn/gp/profile/A34PAP6LGJIN6N/more?next_batch_params%5Breview_offset%5D=10&_=1469081762384
but it returns 404
.
Upvotes: 0
Views: 171
Reputation: 21406
You need to replicate the headers you see in the request.
If you inspect the response headers you can see:
from this you need to update your scrapy.Request.headers
attribute. With few of these values. For the most part you can skip the Cookie since scrapy manages this one by itself and usually for ajax requests like this it's meaningless.
For this case I've manage to get a successful response by replicating only X-Requested-With
header. This header is used to indicate that ajax request is happening.
You can actually test out and engineer this real time:
scrapy shell <url>
# gives you 403
request.headers.update({'X-Requested-With': 'XMLHttpRequest'})
request.headers.update({'User-Agent': <some user agent>})
fetch(request)
# now the request is redownloaded and it's 200!
Upvotes: 2