user12952651
user12952651

Reputation:

BeautifulSoup: find_all() returns an empty list

After checking the source page of the website I am crawling, I guess the reason that I could not fetch the content I wanted is that there's no div element in the source page. I also tried using css selector (in another question BeautifulSoup: Why .select method returned an empty list?), but that won't work either. Here's some of my code:

# Scraping top products sales and name from the Recommendation page

from selenium import webdriver
from bs4 import BeautifulSoup as bs
import json
import requests
import numpy as np
import pandas as pd

headers = {
    'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36',
    'cookie': '_gcl_au=1.1.961206468.1594951946; _med=refer; _fbp=fb.2.1594951949275.1940955365; SPC_IA=-1; SPC_F=y1evilme0ImdfEmNWEc08bul3d8toc33; REC_T_ID=fab983c8-c7d2-11ea-a977-ccbbfe23657a; SPC_SI=uv1y64sfvhx3w6dir503ixw89ve2ixt4; _gid=GA1.3.413262278.1594951963; SPC_U=286107140; SPC_EC=GwoQmu7TiknULYXKODlEi5vEgjawyqNcpIWQjoxjQEW2yJ3H/jsB1Pw9iCgGRGYFfAkT/Ej00ruDcf7DHjg4eNGWbCG+0uXcKb7bqLDcn+A2hEl1XMtj1FCCIES7k17xoVdYW1tGg0qaXnSz0/Uf3iaEIIk7Q9rqsnT+COWVg8Y=; csrftoken=5MdKKnZH5boQXpaAza1kOVLRFBjx1eij; welcomePkgShown=true; _ga=GA1.1.1693450966.1594951955; _dc_gtm_UA-61904553-8=1; REC_MD_30_2002454304=1595153616; _ga_SW6D8G0HXK=GS1.1.1595152099.14.1.1595153019.0; REC_MD_41_1000044=1595153318_0_50_0_49; SPC_R_T_ID="Am9bCo3cc3Jno2mV5RDkLJIVsbIWEDTC6ezJknXdVVRfxlQRoGDcya57fIQsioFKZWhP8/9PAGhldR0L/efzcrKONe62GAzvsztkZHfAl0I="; SPC_T_IV="IETR5YkWloW3OcKf80c6RQ=="; SPC_R_T_IV="IETR5YkWloW3OcKf80c6RQ=="; SPC_T_ID="Am9bCo3cc3Jno2mV5RDkLJIVsbIWEDTC6ezJknXdVVRfxlQRoGDcya57fIQsioFKZWhP8/9PAGhldR0L/efzcrKONe62GAzvsztkZHfAl0I="'
}
shopee_url = 'https://shopee.co.id/top_products'

navi_info = requests.get('https://shopee.co.id/api/v4/recommend/recommend?bundle=top_sold_product_microsite&limit=20&offset=0')
# extracts all the "index" data from all "sections"
index_arrays = [object_['index'] for object_ in navi_info.json()['data']['sections']]
index_array = index_arrays[0] # only one section with "index" key is present
# extract all catIDs from the "index" payload
catIDs = [object_['key'] for object_ in index_array]
params = {'catID': catIDs}
print(params)

# a = requests.get(link, headers=headers)
response = requests.get('https://shopee.co.id/top_products', params=params)
print(response.text)
soup = bs(response.text, 'html.parser')
products = soup.find_all('div', attrs={'class': '_3S8sOC _2QfAXF'})
print(products) # Why this returns an empty list? 
for product in products:
    name = product.select_one('#main > div > div.shopee-page-wrapper > div.page-product > div.container > div.product-briefing.flex.card._2cRTS4 > div.flex.flex-auto.k-mj2F > div > div.qaNIZv > span')
    sales = product.select_one('#main > div > div.shopee-page-wrapper > div.page-product > div.container > div.product-briefing.flex.card._2cRTS4 > div.flex.flex-auto.k-mj2F > div > div.flex._32fuIU > div.flex.SbDIui > div._22sp0A')
    print(name)
    print(sales)

Upvotes: 2

Views: 769

Answers (1)

user13244731
user13244731

Reputation:

You get no items in the find_all() list, because in the page (HTML) there are no tags with that class: '_3S8sOC _2QfAXF'

You can check this easily scraping all the elements with this class:

import requests
from bs4 import BeautifulSoup as bs

response = requests.get('https://shopee.co.id/top_products').content
soup = bs(response,"html.parser")
products = soup.find_all(class_ = "_3S8sOC _2QfAXF")

Note here that I am scraping the content attribute and also using the class_ kwarg in the find.all() method, to get all the elements with this specific class.

Unfortanaly there are no elemets. ://

print(products)

give back:

[]

Upvotes: 0

Related Questions