Reputation: 31
I am new to webscraping. I am trying to scrape the heading (QCY T5 Wireless Bluetooth Earphones V5.0 Touch Control Stereo HD talking with 380mAh battery) using BeautifulSoup but it's showing me None in output. Here is the code I have tried:
from bs4 import BeautifulSoup
import requests
page=requests.get('https://www.daraz.pk/products/qcy-t5-wireless-bluetooth-earphones-v50-touch-control-stereo-hd-talking-with-380mah-battery-i143388262-s1304364361.html?spm=a2a0e.searchlist.list.1.5b7c4a71Jr4QZb&search=1')
soup=BeautifulSoup(page.content,'html.parser')
print (page.status_code)
heading=soup.find(class_='pdp-mod-product-badge-title')
print(heading)
The html code from website:
<div class="pdp-mod-product-badge-wrapper"><span class="pdp-mod-product-badge-title" data-spm-anchor-id="a2a0e.pdp.0.i0.4f257123ixGMNY">QCY T5 Wireless Bluetooth Earphones V5.0 Touch Control Stereo HD talking with 380mAh battery</span></div>
Upvotes: 0
Views: 90
Reputation: 53
The reason for you not being able to fetch the data is because the View Source of the website doesn't have the class you mentioned.
One basic mistake that beginners do is to look for an element in the Inspect tab of a page and identify classes for scraping. Never do so.
For reliability of all the data always go to View Source of the page by doing Ctrl + U and look for your content. In most cases, the content is dynamically rendered by using a JS file and API calls which can be found from the network tab.
For the above problem also the information you are looking for is dynamically loaded and not available in the source code of the page.
Upvotes: 1
Reputation: 202
There is no "pdp-mod-product-badge-title" in page.content, The correct class is "breadcrumb_item_anchor_last" which you can extract it in View Source in your browser.
Code:
from bs4 import BeautifulSoup
import requests
page=requests.get('https://www.daraz.pk/products/qcy-t5-wireless-bluetooth-earphones-v50-touch-control-stereo-hd-talking-with-380mah-battery-i143388262-s1304364361.html?spm=a2a0e.searchlist.list.1.5b7c4a71Jr4QZb&search=1')
soup=BeautifulSoup(page.content,'html.parser')
print (page.status_code)
heading=soup.find(class_='breadcrumb_item_anchor_last')
print(heading.text.strip()) #Thanks to @bigbounty
Upvotes: 2