MITHU
MITHU

Reputation: 154

Unable to parse all the asins available in a webpage

I've created a script in python to fetch all the asins that are available in a certain node. There are around 1000 asins in there. The way I've tried below can fetch me 146 asins out of 1000. Although the number of pages is changing accordingly when I hit the SHOW MORE button located at the bottom of that page, I get the exact same asins when I change the page numbers within my script.

webpage address

I've tried so far with:

import re
import json
import requests
from bs4 import BeautifulSoup

node = '15529609011'

r = requests.get(f'https://www.amazon.com/stores/node/{node}?productGridPageIndex=1')
soup = BeautifulSoup(r.content,'lxml')
slot_num = soup.select_one('.stores-widget-btf')['id']
res = requests.get(f'https://www.amazon.com/stores/slot/{slot_num}?node={node}')
p = re.compile(r'var config = (.*);')
data = json.loads(p.findall(res.text)[0])
asins = data['content']['ASINList']
print(len(asins))

How can I grab all the asins available in there using requests?

Upvotes: 1

Views: 219

Answers (1)

hunzter
hunzter

Reputation: 598

The data from Show More button is loaded via an ajax requests.

You can either:

  1. Easier, but more resource consuming: Using a headless browser (e.g: chromedriver headless) with selenium
  2. Harder, but lighter: Open broswer's Dev Tool. Find and analyze the ajax request, build one and send via python.

Upvotes: 1

Related Questions