XO39
XO39

Reputation: 483

How to scrape data generated with infinite scroll?

How to scrape the list of product from this page with scrapy?

I have tried the ajax request url the browser sends:

https://www.amazon.cn/gp/profile/A34PAP6LGJIN6N/more?next_batch_params%5Breview_offset%5D=10&_=1469081762384

but it returns 404.

Upvotes: 0

Views: 171

Answers (1)

Granitosaurus
Granitosaurus

Reputation: 21406

You need to replicate the headers you see in the request.

If you inspect the response headers you can see: amazon.ca next page headers

from this you need to update your scrapy.Request.headers attribute. With few of these values. For the most part you can skip the Cookie since scrapy manages this one by itself and usually for ajax requests like this it's meaningless.

For this case I've manage to get a successful response by replicating only X-Requested-With header. This header is used to indicate that ajax request is happening.

You can actually test out and engineer this real time:

scrapy shell <url>
# gives you 403
request.headers.update({'X-Requested-With': 'XMLHttpRequest'})
request.headers.update({'User-Agent': <some user agent>})
fetch(request)
# now the request is redownloaded and it's 200!

Upvotes: 2

Related Questions