Jordan P
Jordan P

Reputation: 31

Error:None while trying to scrape data using BeautifulSoup

I am new to webscraping. I am trying to scrape the heading (QCY T5 Wireless Bluetooth Earphones V5.0 Touch Control Stereo HD talking with 380mAh battery) using BeautifulSoup but it's showing me None in output. Here is the code I have tried:

from bs4 import BeautifulSoup
import requests

page=requests.get('https://www.daraz.pk/products/qcy-t5-wireless-bluetooth-earphones-v50-touch-control-stereo-hd-talking-with-380mah-battery-i143388262-s1304364361.html?spm=a2a0e.searchlist.list.1.5b7c4a71Jr4QZb&search=1')
soup=BeautifulSoup(page.content,'html.parser')
print (page.status_code)

heading=soup.find(class_='pdp-mod-product-badge-title')
print(heading)

The html code from website:

<div class="pdp-mod-product-badge-wrapper"><span class="pdp-mod-product-badge-title" data-spm-anchor-id="a2a0e.pdp.0.i0.4f257123ixGMNY">QCY T5 Wireless Bluetooth Earphones V5.0 Touch Control Stereo HD talking with 380mAh battery</span></div>

Webiste image

Upvotes: 0

Views: 90

Answers (2)

Dark Star
Dark Star

Reputation: 53

The reason for you not being able to fetch the data is because the View Source of the website doesn't have the class you mentioned.

One basic mistake that beginners do is to look for an element in the Inspect tab of a page and identify classes for scraping. Never do so.

For reliability of all the data always go to View Source of the page by doing Ctrl + U and look for your content. In most cases, the content is dynamically rendered by using a JS file and API calls which can be found from the network tab.

For the above problem also the information you are looking for is dynamically loaded and not available in the source code of the page.

Upvotes: 1

Ali Adhami
Ali Adhami

Reputation: 202

There is no "pdp-mod-product-badge-title" in page.content, The correct class is "breadcrumb_item_anchor_last" which you can extract it in View Source in your browser.

View Source

Code:

from bs4 import BeautifulSoup
import requests

page=requests.get('https://www.daraz.pk/products/qcy-t5-wireless-bluetooth-earphones-v50-touch-control-stereo-hd-talking-with-380mah-battery-i143388262-s1304364361.html?spm=a2a0e.searchlist.list.1.5b7c4a71Jr4QZb&search=1')
soup=BeautifulSoup(page.content,'html.parser')
print (page.status_code)

heading=soup.find(class_='breadcrumb_item_anchor_last')

print(heading.text.strip()) #Thanks to @bigbounty

Upvotes: 2

Related Questions