BeautifulSoup-Python : How do you scrape data that has not been loaded yet?

Question

I tried scraping using BeautifulSoup but it returns []. Then when I tried viewing the source code there's div class="loading32".

How do you scrape this kind of elements ?

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = productUrl  # bs4 part
uClient = uReq(my_url)  # bs4 part
page_html = uClient.read()  # bs4 part
uClient.close()  # bs4 part
page_soup = soup(page_html, "html.parser")  # bs4 part
description = page_soup.findAll("div", {"class": "ui-box product-description-main"})
string4 = str(description)

URL : https://www.aliexpress.com/store/product/100-Original-16-Shimano-Casitas-150-151-150hg-151hg-Right-Left-Hand-Baitcasting-Fishing-Reel-4/1053031_32657797704.html?spm=2114.12010608.0.0.22e12d66I7a3Dp


        Product Description

Martin Evans · Accepted Answer

The information is all there, it does not need javascript to be used. You just need to look through the HTML that is returned and decide the best method to extract each item that you want. I have guessed you might be trying to get something like the following:

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup 

my_url = 'https://www.aliexpress.com/store/product/100-Original-16-Shimano-Casitas-150-151-150hg-151hg-Right-Left-Hand-Baitcasting-Fishing-Reel-4/1053031_32657797704.html?spm=2114.12010608.0.0.22e12d66I7a3Dp'
uClient = uReq(my_url)  # bs4 part
page_html = uClient.read()  # bs4 part
uClient.close()  # bs4 part

soup = BeautifulSoup(page_html, "html.parser")  # bs4 part

details = {}
details['Product Name'] = soup.find('h1', class_='product-name').text
details['Price Range'] = soup.find('div', class_='p-current-price').find_all('span')[1].text

item_specifics = soup.find('ul', class_='product-property-list util-clearfix')
for li in item_specifics.find_all('li'):
    entry = li.get_text(strip=True).split(':')
    details[entry[0]] = ', '.join(entry[1:])

# Locate the image    
li = soup.find('div', class_='ui-image-viewer-thumb-wrap')
url = li.img['src']
details['Image URL'] = url
details['Image Filename'] = url.rsplit('/', 1)[1]

for item, desc in details.items():
    print('{:30} {}'.format(item, desc))

Would give you the following information:

Product Name                   Original 2016 Shimano Casitas 150 151 150hg 151hg Right Left Hand Baitcasting Fishing Reel 4+1BB 5.5kg SVS Infinity fishing reel
Price Range                    83.60 - 85.60
Fishing Method                 Bait Casting
Baits Type                     Fake Bait
Position                       Ocean Rock Fshing,River,Stream,Reservoir Pond,Ocean Beach Fishing,Lake,Ocean Boat Fishing
Fishing Reels Type             Baitcast Reel
Model Number                   Casitas
Brand Name                     Shimano
Ball Bearings                  4+1BB
Feature 1                      Shimano Stable Spool S3D
Feature 2                      SVS Infinity Brake System (Infinite Cast Control)
Model                          150/ 151/ 150HG/ 151HG
PE Line (50 test /m)           20-150/30-135/ 40-105
Nylon Line (51hg test /m)      10-120/12-110/14-90
Weight                         190g
Gear Ratio                     6.3, 1 / 7.2, 1
Made in                        Malaysia
Image URL                      https://ae01.alicdn.com/kf/HTB1qRKzJFXXXXboXVXXq6xXFXXXU/Original-2016-Shimano-Casitas-150-151-150hg-151hg-Right-Left-Hand-Baitcasting-Fishing-Reel-4-1BB.jpg_640x640.jpg
Image Filename                 Original-2016-Shimano-Casitas-150-151-150hg-151hg-Right-Left-Hand-Baitcasting-Fishing-Reel-4-1BB.jpg_640x640.jpg

The image information is also stored. This could then be downloaded using another uReq call and saving the data as binary into a file using the filename obtained.

BeautifulSoup-Python : How do you scrape data that has not been loaded yet?

Answers (2)

Related Questions