Reputation:
After checking the source page of the website I am crawling, I guess the reason that I could not fetch the content I wanted is that there's no div element in the source page. I also tried using css selector (in another question BeautifulSoup: Why .select method returned an empty list?), but that won't work either. Here's some of my code:
# Scraping top products sales and name from the Recommendation page
from selenium import webdriver
from bs4 import BeautifulSoup as bs
import json
import requests
import numpy as np
import pandas as pd
headers = {
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36',
'cookie': '_gcl_au=1.1.961206468.1594951946; _med=refer; _fbp=fb.2.1594951949275.1940955365; SPC_IA=-1; SPC_F=y1evilme0ImdfEmNWEc08bul3d8toc33; REC_T_ID=fab983c8-c7d2-11ea-a977-ccbbfe23657a; SPC_SI=uv1y64sfvhx3w6dir503ixw89ve2ixt4; _gid=GA1.3.413262278.1594951963; SPC_U=286107140; SPC_EC=GwoQmu7TiknULYXKODlEi5vEgjawyqNcpIWQjoxjQEW2yJ3H/jsB1Pw9iCgGRGYFfAkT/Ej00ruDcf7DHjg4eNGWbCG+0uXcKb7bqLDcn+A2hEl1XMtj1FCCIES7k17xoVdYW1tGg0qaXnSz0/Uf3iaEIIk7Q9rqsnT+COWVg8Y=; csrftoken=5MdKKnZH5boQXpaAza1kOVLRFBjx1eij; welcomePkgShown=true; _ga=GA1.1.1693450966.1594951955; _dc_gtm_UA-61904553-8=1; REC_MD_30_2002454304=1595153616; _ga_SW6D8G0HXK=GS1.1.1595152099.14.1.1595153019.0; REC_MD_41_1000044=1595153318_0_50_0_49; SPC_R_T_ID="Am9bCo3cc3Jno2mV5RDkLJIVsbIWEDTC6ezJknXdVVRfxlQRoGDcya57fIQsioFKZWhP8/9PAGhldR0L/efzcrKONe62GAzvsztkZHfAl0I="; SPC_T_IV="IETR5YkWloW3OcKf80c6RQ=="; SPC_R_T_IV="IETR5YkWloW3OcKf80c6RQ=="; SPC_T_ID="Am9bCo3cc3Jno2mV5RDkLJIVsbIWEDTC6ezJknXdVVRfxlQRoGDcya57fIQsioFKZWhP8/9PAGhldR0L/efzcrKONe62GAzvsztkZHfAl0I="'
}
shopee_url = 'https://shopee.co.id/top_products'
navi_info = requests.get('https://shopee.co.id/api/v4/recommend/recommend?bundle=top_sold_product_microsite&limit=20&offset=0')
# extracts all the "index" data from all "sections"
index_arrays = [object_['index'] for object_ in navi_info.json()['data']['sections']]
index_array = index_arrays[0] # only one section with "index" key is present
# extract all catIDs from the "index" payload
catIDs = [object_['key'] for object_ in index_array]
params = {'catID': catIDs}
print(params)
# a = requests.get(link, headers=headers)
response = requests.get('https://shopee.co.id/top_products', params=params)
print(response.text)
soup = bs(response.text, 'html.parser')
products = soup.find_all('div', attrs={'class': '_3S8sOC _2QfAXF'})
print(products) # Why this returns an empty list?
for product in products:
name = product.select_one('#main > div > div.shopee-page-wrapper > div.page-product > div.container > div.product-briefing.flex.card._2cRTS4 > div.flex.flex-auto.k-mj2F > div > div.qaNIZv > span')
sales = product.select_one('#main > div > div.shopee-page-wrapper > div.page-product > div.container > div.product-briefing.flex.card._2cRTS4 > div.flex.flex-auto.k-mj2F > div > div.flex._32fuIU > div.flex.SbDIui > div._22sp0A')
print(name)
print(sales)
Upvotes: 2
Views: 769
Reputation:
You get no items in the find_all()
list, because in the page (HTML) there are no tags with that class: '_3S8sOC _2QfAXF'
You can check this easily scraping all the elements with this class:
import requests
from bs4 import BeautifulSoup as bs
response = requests.get('https://shopee.co.id/top_products').content
soup = bs(response,"html.parser")
products = soup.find_all(class_ = "_3S8sOC _2QfAXF")
Note here that I am scraping the content
attribute and also using the class_
kwarg in the find.all()
method, to get all the elements with this specific class.
Unfortanaly there are no elemets. ://
print(products)
give back:
[]
Upvotes: 0